AI写代码:循环两次效果最好,三次反而变差
大模型写代码时,重复思考能提升质量,但并非越多越好。研究者发现,让模型把同一段代码逻辑“循环”两次,效果最佳——在SWE-bench(软件工程基准)上得分从43%跃升至64%,三次或更多循环反而导致性能下降。原因是:第二次循环能有效修正错误,但第三次起,模型开始做无意义的微小调整,同时每次循环都会引入位置偏差,成本超过收益。这解释了为什么“多想一遍”有用,“多想两遍”就多余了。它不是你明天能用上的技巧,但揭示了AI推理的一个底层规律:适度重复有益,过度则有害。
📄 原文摘要(英文)
Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.