📄 论文解读

AI写代码：循环两次效果最好，三次反而变差

趋势通道 ▲ 64 大模型代码生成循环次数性能优化SWE-bench

大模型写代码时，重复思考能提升质量，但并非越多越好。研究者发现，让模型把同一段代码逻辑“循环”两次，效果最佳——在SWE-bench（软件工程基准）上得分从43%跃升至64%，三次或更多循环反而导致性能下降。原因是：第二次循环能有效修正错误，但第三次起，模型开始做无意义的微小调整，同时每次循环都会引入位置偏差，成本超过收益。这解释了为什么“多想一遍”有用，“多想两遍”就多余了。它不是你明天能用上的技巧，但揭示了AI推理的一个底层规律：适度重复有益，过度则有害。

📄 原文摘要(英文)

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.

arXiv 原文

📬 订阅 AI Pulse