📄 论文解读

AI 学会模拟世界：用语言模型当“环境模拟器”

信赖通道 ▲ 62 世界模型语言模型环境模拟智能体强化学习

我们总让 AI 在真实环境里试错学习，但真实环境太贵、太慢、太危险。这篇论文反其道而行：用语言模型本身去模拟环境——你告诉它“我做了个动作”，它直接预测“环境会变成什么样”，而不是真的去跑一遍。研究者训练了两个语言世界模型（35B 和 397B 参数），覆盖 7 个领域，用超过 1000 万条真实交互轨迹，通过三步训练（通用建模、推理激活、模拟保真度强化）让模型学会像“世界模拟器”一样思考。结果：它模拟的环境比真实环境训练出的智能体更强，还能作为预训练步骤提升下游任务表现。它不是你明天能用上的，但它是“让 AI 在脑子里跑模拟”的关键一步——未来 AI 可能不再需要每次都跟真实世界打交道，而是先在语言世界里推演无数遍再行动。

📄 原文摘要(英文)

A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation models for agentic environment simulation. We introduce Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B, the first language world models capable of simulating agentic environments covering 7 domains via long chain-of-thought reasoning. Leveraging more than 10M environment interaction trajectories of 7 domains in real-world environments, we develop Qwen-AgentWorld through a three-stage training pipeline: CPT injects general-purpose world modeling capabilities from the state transition dynamics and augmented professional corpora, SFT activates next-state-prediction reasoning, and RL sharpens simulation fidelity through a tailored framework with hybrid rubric-and-rule rewards. To evaluate language world models, we present AgentWorldBench, a comprehensive benchmark constructed from real-world interactions of 5 frontier models on 9 established benchmarks. Empirical results demonstrate that Qwen-AgentWorld significantly outperforms existing frontier models. (ii) Beyond foundation models, we further investigate two complementary paradigms through which world modeling enhances general agents. First, as a decoupled environment simulator, Qwen-AgentWorld supports scalable and controllable simulation of thousands of real-world environments for agentic RL, yielding gains that surpass real-environment training alone. Second, as a unified agent foundation model, world-model training acts as a highly effective warm-up that improves downstream performance across 7 agentic benchmarks. Code: https://github.com/QwenLM/Qwen-AgentWorld

arXiv 原文

📬 订阅 AI Pulse