AI Pulse
📄 论文解读

让AI世界随你动:从第一人称到上帝视角

你玩VR游戏时,只能看到自己的手,身体在哪全靠猜。这篇论文让AI模拟的世界不再有这种盲区:它用3D人体动作作为交互方式,同时引入一个“上帝视角”的辅助训练,让模型能看到你全身的位置和动作,从而更准确地理解人与环境的互动。更酷的是,你可以在一个统一的世界坐标系里定义“锚点视角”,配合文字描述,让局部场景按你的想法动态演变——比如指定“这面墙会随时间裂开”。实验证明,这种定制方案能保持时空几何一致性,严格遵循你设定的演化规则。虽然它不是你明天就能用的产品,但为未来更沉浸、更可控的虚拟世界(如VR游戏、数字孪生)铺平了道路。

📄 原文摘要(英文)

Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexible mechanism for world customization. First, we utilize 3D human motion as the primary interaction modality. To complement the out-of-view or truncated body parts in egocentric views, we introduce an auxiliary training supervision that incorporates exogenous viewpoints decoupled from the agent's first-person sensorium. It allows the model to observe the agent's full-body positioning relative to the environment, facilitating a more robust spatial grounding of human-world interactions. Furthermore, we propose a simple yet effective mechanism for customizing self-evolving worlds. This is achieved by defining anchor views within a unified world coordinate system, coupled with textual descriptions dictating the dynamic evolution of local scenes. Experimental results show that AnchorWorld significantly outperforms state-of-the-art baselines, while ablation studies validate the effectiveness of our key designs. Notably, our customization scheme exhibits promising spatio-temporal geometric consistency and adheres strictly to the prescribed evolutionary dynamics.

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部