AI Pulse
📄 论文解读

让AI自己搞科研,关键不是教它怎么做,而是给它搭好实验室

我们总以为让AI做科研,难点在于教它怎么思考、怎么推理。但这篇论文提出了一个反直觉的观点:瓶颈不在AI的脑子,而在它身处的环境。就像给一个天才科学家一个乱糟糟、没工具的实验室,他也出不了成果。研究者设计了一套叫EurekAgent的系统,核心不是改进AI的算法,而是精心搭建它的工作环境:比如限制它能访问什么资源、用Git管理它产生的所有文件、给它设定预算让它自己决定怎么花、以及让人能随时插手干预。结果呢?它在数学、内核工程、机器学习任务上都达到了新高度,其中一个经典难题(26个圆的最密堆积)它用不到11美元的API成本就找到了比人类更好的解。这套思路不是让你明天就能用上的工具,但它指出了一个趋势:未来AI自主科研的竞争,可能从比谁模型大,变成比谁给AI搭的实验室更聪明。

📄 原文摘要(英文)

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we argue that the bottleneck for autonomous scientific discovery is shifting from prescribing agent workflows to designing agent environments: the resources, constraints, and interfaces that shape agent behavior. We frame this as environment engineering: building environments that amplify productive behaviors, such as open-ended exploration, systematic artifact management, and inter-agent collaboration, while suppressing harmful behaviors, such as reward hacking and high-friction human oversight. We present EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery. EurekAgent engineers the environment along four dimensions: permissions engineering for bounded agent execution and isolated evaluation; artifact engineering for filesystem and Git-based collaboration; budget engineering for budget-aware exploration; and human-in-the-loop engineering for easy human supervision and intervention. EurekAgent sets new state-of-the-art results on multiple mathematics, kernel engineering, and machine learning tasks, including new state-of-the-art 26-circle packing results discovered with less than $11 in total API cost. We open-source our code and results, and call for environment engineering as a core research direction for developing reliable autonomous research agents.

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部