📄 论文解读

AI终于能像人一样理解物理世界了

信赖通道 ▲ 30 世界模型物理AI机器人长期记忆多模态学习

现在的AI做物理任务（比如机器人抓杯子）时，往往只看到眼前一帧画面，没有“世界在持续运转”的概念。这篇论文的团队做了一个叫Kairos的框架，让AI像人一样：看大量视频、人类行为、机器人数据来学习世界规律；用一套混合注意力机制同时记住短期动作和长期状态；还能在普通电脑上低延迟运行。在多个机器人控制、长期预测的测试中，Kairos达到了顶尖水平。它不是你明天就能用的产品，但它是让机器人真正理解“杯子放下后不会消失”这类常识的关键一步。

📄 原文摘要(英文)

World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kairos, a native world model stack designed around these requirements. (1) Kairos learns the world by pioneering a Native Pre-training Paradigm governed by a Cross-Embodiment Data Curriculum, which organizes open-world videos, human behavioral data, and robot interactions into a progressive developmental pathway. (2) Kairos maintains the world by unified world understanding, generation, and prediction within a Native Unified Architecture equipped with Hybrid Linear Temporal Attention, where sliding-window attention captures local dynamics, dilated sliding windows capture mid-range dependencies, and gated linear attention maintains persistent global memory. We establish formal theoretical bounds demonstrating that this temporal factorization strictly limits error accumulation, mathematically guaranteeing state propagation across extended horizons. (3) Kairos runs the world by incorporating a Deployment-Aware System Co-Design to support low-latency rollout generation on server and consumer-grade hardware for real-world observation-action-feedback loops. Experiments on embodied world-model, long-horizon, and action-policy benchmarks show that Kairos achieves top level performance while offering a strong efficiency-capability trade-off. Together, these results position Kairos as a cohesive operational foundation for future self-evolving physical intelligence.

arXiv 原文

订阅 AI Pulse