📄 论文解读

3B小模型推理能力碾压千亿大模型

趋势通道 ▲ 71 小模型推理能力强化学习数学竞赛编程

一个只有3B参数的小模型，在数学竞赛和编程题上，干翻了DeepSeek V3.2、GLM-5这些千亿级大模型。研究者用了一套「课程学习+强化学习+自蒸馏」的组合训练法，让模型在AIME数学竞赛上拿到94.3分（加测试时缩放能到97.1），在LiveCodeBench编程题上80.2%一次通过，甚至没见过的LeetCode新题也有96.1%的接受率。关键是，它没有牺牲指令遵循能力——IFEval得分93.4。这挑战了「模型越大越聪明」的常识，暗示推理能力可以被压缩进小模型的核心，而大模型多出来的参数更多是用来装常识和长尾知识。它不是你明天能用上的，但如果你关心AI的效率和成本，这是一个信号：未来可能不需要烧钱堆参数，也能做出顶尖的推理AI。

📄 原文摘要(英文)

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. Building upon the Spectrum-to-Signal post-training paradigm, we systematically enhance the model through an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. Experimental evaluations demonstrate that VibeThinker-3B achieves frontier-level performance on highly demanding verifiable tasks. Specifically, it attains a score of 94.3 on AIME26 (improving to 97.1 with claim-level test-time scaling), an 80.2 Pass@1 on LiveCodeBench v6, and exhibits strong out-of-distribution generalization with a 96.1\% acceptance rate on recent unseen LeetCode contests. This effectively places it in the performance band of first-tier reasoning systems, matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2, GLM-5, and Gemini 3 Pro. Furthermore, a score of 93.4 on IFEval confirms that this extreme reasoning enhancement does not compromise strict instruction controllability. Extending our previous 1.5B work, these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios. This perspective suggests that compact models are not merely deployment-efficient substitutes, but a complementary path toward frontier-level performance in parameter-dense capability regimes.

arXiv 原文

📬 订阅 AI Pulse