AI 数学证明首次超过人类金牌线
AI 在数学竞赛上首次超过了人类金牌线。MaxProof 框架让模型同时具备生成、验证、修复证明的能力,并在测试时通过“种群搜索”和“锦标赛选择”从大量候选证明中挑出最优解。在 IMO 2025 和 USAMO 2026 上分别拿到 35/42 和 36/42,超过人类金牌线。这不是你明天能用的工具,但它意味着 AI 在需要严谨推理的领域(如数学、法律、代码)正在逼近甚至超越人类专家。
📄 原文摘要(英文)
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.