AI Pulse
📄 论文解读

AI终于能像人一样理解物理世界了

现在的AI做物理任务(比如机器人抓杯子、自动驾驶)时,往往只看到眼前一帧画面,没有对世界持续的理解。这篇论文的团队做了一个叫Kairos的框架,让AI像人一样:看大量视频、人类行为、机器人操作来学习世界规律;用一套特殊注意力机制记住过去、预测未来,而且数学上保证不会越记越乱;最后还能在普通电脑上实时运行。它不是明天就能用的产品,但指明了物理AI从“看见”到“理解”的关键一步。

📄 原文摘要(英文)

World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kairos, a native world model stack designed around these requirements. (1) Kairos learns the world by pioneering a Native Pre-training Paradigm governed by a Cross-Embodiment Data Curriculum, which organizes open-world videos, human behavioral data, and robot interactions into a progressive developmental pathway. (2) Kairos maintains the world by unified world understanding, generation, and prediction within a Native Unified Architecture equipped with Hybrid Linear Temporal Attention, where sliding-window attention captures local dynamics, dilated sliding windows capture mid-range dependencies, and gated linear attention maintains persistent global memory. We establish formal theoretical bounds demonstrating that this temporal factorization strictly limits error accumulation, mathematically guaranteeing state propagation across extended horizons. (3) Kairos runs the world by incorporating a Deployment-Aware System Co-Design to support low-latency rollout generation on server and consumer-grade hardware for real-world observation-action-feedback loops. Experiments on embodied world-model, long-horizon, and action-policy benchmarks show that Kairos achieves top level performance while offering a strong efficiency-capability trade-off. Together, these results position Kairos as a cohesive operational foundation for future self-evolving physical intelligence.

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部