克隆电影级运镜,AI导演来了
你刷到的短视频里,镜头要么固定不动,要么乱晃。现在有个AI能像导演一样,从参考视频里精准克隆运镜——推拉摇移跟,还能在多镜头间无缝切换。研究者把相机运动画成一种“网格视频”,让AI看懂轨迹,再结合人物、动作一起控制。它用百万级数据训练,效果比现有方法好一大截。虽然你明天用不上,但这是视频生成从“动起来”到“拍得好”的关键一步。
📄 原文摘要(英文)
Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in poor performance in complicated camera motion cloning. To address these issues, we introduce a general camera motion representation that encodes cameras as grid motion videos. This camera grid represents the camera parameters visually and supports the integration of diverse trajectories for multi-shot video generation. Building upon this, we propose OmniDirector, a unified framework trained on a million-scale camera grid-video pairs that coordinates characters, actions, and cameras to provide director-level control for multimodal diffusion transformers. Furthermore, we design a novel hierarchical prompt expansion agent that harmoniously integrates different control signals by systematically describing camera motion and visual content through understanding signal relationships. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework. Project page: https://ymlinfeng.github.io/OmniDirector.github.io/