📄 论文解读

让动画角色直接学视频动作，不再靠骨架传话

信赖通道 ▲ 23 角色动画端到端运动迁移合成数据偏好优化

以前的角色动画得先抽骨架、再贴回角色，中间丢信息。这篇直接把驱动视频和参考角色视频拼在一起喂给模型，让它自己看明白怎么动。为了凑够训练数据，他们把不同动画任务拆成统一格式，合成了6万对视频。还加了个偏好学习，专修合成数据里的细节瑕疵。效果比现有方法好一截。它不是你明天能用上的，但做游戏、虚拟人、影视预演的人可以关注——以后调动作可能就像拖视频一样简单。

📄 原文摘要(英文)

Controlled character animation requires transferring motion from a driving sequence to a reference character. Prior works heavily rely on intermediate representations, including pose skeletons to represent motion or masked background to represent environment, which inevitably leads to information loss. To address this, we present SCAIL-2, an framework that bypasses those intermediates and achieves end-to-end character animation. By directly concatenating driving videos to the sequence, the model can obtain all the required visual information from the input video. To address lack of end-to-end data, we unify sub-tasks of character animation with decoupled conditions and then curate a pipeline to synthesize MotionPair-60K, an end-to-end motion transfer dataset containing heterogeneous tasks of character animation. To archive the unification, we utilize in-context mask conditioning and mode-specific RoPE as soft guidance beyond textual instructions and raw visual information. To address synthetic discrepancy in detailed regions, we propose Bias-Aware DPO to construct preference items to mitigate the errors. Extensive experiments demonstrate that our method substantially outperforms existing state-of-the-art approaches in various character animation tasks. A large subset of synthetic data as well as model weights will be released at our project page: https://teal024.github.io/SCAIL-2/.

arXiv 原文

📬 订阅 AI Pulse