📄 论文解读

AI 学会从零写整个代码库了

趋势通道 ▲ 27 AI编程代码生成软件工程数据集大模型

以前 AI 写代码最多改个 bug、补个函数，现在它开始从零搭建整个软件仓库——从文档描述直接生成完整项目。研究者造了一个 4818 个实例的数据集 DeNovoSWE，每个实例要求 AI 根据文档写出整个仓库。他们用「分而治之」和「批评-修复」的流程自动生成数据，再用难度筛选保证质量。在 Qwen3-30B 上微调后，模型在 BeyondSWE-Doc2Repo 基准上的得分从 5.8% 飙升到 47.2%。这不是你明天能用的工具，但它意味着 AI 离「你说需求，它出产品」又近了一大步。

📄 原文摘要(英文)

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, we introduce DeNovoSWE, a large-scale dataset for whole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designed sandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with "divide and conquer" and critic-repair philosophy. To balance data quality and diversity, we further introduce a difficulty-aware trajectory filtering strategy. Fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challenging BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.

arXiv 原文

📬 订阅 AI Pulse