📄 论文解读

让AI自己当老师：一个模型分饰两角，越练越强

信赖通道 ▲ 73 大模型智能体自我进化内部奖励强化学习

通常训练AI智能体需要外部环境反馈，效率低且场景固定。这篇论文让同一个大模型同时扮演“智能体”和“环境”两个角色：作为智能体时，它预测行动后的结果，与实际结果对比产生内部奖励，从而学会考虑环境；作为环境时，它分析自己的失败案例，找出相似错误的任务来重点练习。在多个测试中，这种方法让模型平均性能提升超过4%。它不是你明天能用上的，但展示了一种让AI自我进化的新思路——不需要外部裁判，自己就能迭代。

📄 原文摘要(英文)

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent, black{a framework} that harnesses a single LLM to function concurrently as both the agent and the environment, enabling a bootstrapped co-evolution. Role-Agent comprises two synergistic components: World-In-Agent (WIA) and Agent-In-World (AIW). In WIA, the LLM acts as the agent and predicts future states after each action; the alignment between predicted and actual states is then used as a process reward, encouraging environment-aware reasoning. In AIW, the LLM analyzes failure modes from failed trajectories and retrieves tasks with similar failure patterns, thereby reshaping the training data distribution for targeted practice. Experiments on multiple benchmarks show that Role-Agent consistently improves performance, yielding an average gain of over 4\% over strong baselines.

arXiv 原文

📬 订阅 AI Pulse