📄 论文解读

一个模型搞定文生图、局部改、全局改，还不打架

信赖通道 ▲ 36 图像生成多能力融合流匹配模型蒸馏

现在的图像生成模型往往各有所长：有的擅长文生图，有的擅长局部编辑，有的擅长全局编辑。但想把它们塞进一个模型里，它们会互相干扰——编辑能力会拖累文生图质量，局部和全局编辑也会打架。这篇论文提出 DanceOPD，核心思路是给每个能力分配一个“专属通道”，训练时让模型根据当前任务自动选择走哪个通道，并且只学习自己生成路径上的数据，避免冲突。实验证明，它能在不牺牲文生图质量的前提下，同时提升局部和全局编辑效果。它不是你明天就能用的工具，但为“全能型图像模型”提供了一条可行的训练路径。

📄 原文摘要(英文)

Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. Consequently, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each sample to one capability field, queries one low-noise student-induced state, and trains with a simple velocity MSE objective. With each capability source defined as a velocity field over the shared flow state space, the student learns from fields queried on its own rollout states to compose expert capabilities. This formulation also absorbs operator-defined fields such as classifier-free guidance. Comprehensive experiments on T2I, editing, realism-field absorption, and CFG absorption show that our approach improves multi-capability composition, strengthening target capabilities while preserving anchor generation quality. We believe this work establishes a practical route for generative field distillation in flow-matching models.

arXiv 原文

📬 订阅 AI Pulse