35B模型靠“拉长视野”追上万亿参数
大模型竞赛里,参数规模不是唯一的路。这篇论文用一个35B参数的混合专家模型,在长链条任务上追平甚至超过了万亿参数模型。秘诀不是堆参数,而是“拉长视野”:让模型一次处理平均4.5万token的完整轨迹,从知识查询、操作执行到结果验证一气呵成。训练分三步:先全领域微调打底,再为每个领域训练专家老师,最后用多老师蒸馏把六个不同领域的能力整合到一个模型里。结果在SEAL-0、IFBench等长链条基准上领先万亿参数模型,在SciCode、HLE等任务上也高度接近。它不是你明天能用上的,但指明了一条更经济的进化方向:与其堆参数,不如让模型学会走更远的路。
📄 原文摘要(英文)
We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities. To support this goal, we build a long-horizon knowledge-action infrastructure that connects external knowledge, actions, observations, and verifier outcomes, producing agentic trajectories with an average length of 45K tokens. Based on this, we train Agents-A1 with a three-stage recipe. First, we perform full-domain supervised fine-tuning to align the base model with broad agentic behaviors. Second, we train domain-level teacher models to capture specialized expertise in each domain. Third, we propose a multi-teacher domain-routed on-policy distillation with salient vocabulary alignment to improve knowledge transfer efficiency across different domains, unifying six heterogeneous domains into one deployable student model. Agents-A1 achieves strong and broad performance for long-horizon agent benchmarks. Compared with 1T-parameter model such as Kimi-K2.6 and DeepSeek-V4-pro, Agents-A1 achieves leading results on SEAL-0 (56.4), IFBench (80.6), HiPhO (46.4), FrontierScience-Olympiad (79.0), and MolBench-Bind (56.8), and remains highly competitive on SciCode (44.3), HLE (47.6) and BrowseComp (75.5). We hope this work provides the community with a practical path for scaling the horizon using a 35B agent that can reach or match the performance of 1T models on long-horizon tasks.