让AI视频记住消失又回来的物体
现在的AI视频模型有个硬伤:物体一旦移出画面再回来,它就忘了那东西长什么样,甚至直接消失。WorldDirector把「物体怎么动」和「画面怎么渲染」拆成两件事:先用大语言模型规划物体在3D空间里的轨迹和镜头运动,再拿这些轨迹当指令去生成视频。这样物体就算离开画面很久再回来,长相、颜色、纹理都原样保留。你还能自由控制镜头角度,让物体按你设定的路径走。它不是明天就能用的产品,但指向了「可编辑的虚拟世界」这个方向——游戏、电影预演、自动驾驶仿真里,物体必须持续存在、逻辑自洽,这篇是朝那走的一步。
📄 原文摘要(英文)
We present WorldDirector, a highly controllable video world model framework designed for persistent dynamic object memory and unrestricted viewpoint exploration. Unlike existing world models that entangle physical dynamics with pixel rendering and rely on continuous visual observation to sustain motion, our framework explicitly decouples semantic motion orchestration from visual generation. By leveraging an LLM to coordinate 3D trajectories with camera movements and subsequently employing these orchestrated trajectories as control signals for video generation, our approach ensures strict physical logic and appearance stability, successfully preserving the exact visual identities of dynamic entities even when they re-enter the scene after prolonged periods out of view. Experimental results demonstrate that our method supports the synthesis of complex and extended events with unprecedented controllability and persistent dynamic object memory. Project Page: https://worlddirector.github.io/