AI画画终于学会听人话了:自己检查、自己改错
现在的AI画画经常不听话:你给它一张深度图让它生成对应图片,它画出来的东西重新提取深度后,跟你的输入对不上。以前要么硬塞条件不管结果,要么靠人工调参数,效果总在“听指令”和“画得真”之间二选一。这篇让AI学会自己检查:每画一步先看一眼当前结果跟要求差多少,然后根据这个误差主动修正下一步。在图像修复、风格迁移、3D贴图等任务上,既更听话又画得更真,不再顾此失彼。它不是你明天就能用的工具,但指明了让AI真正“按指令办事”的方向。
📄 原文摘要(英文)
Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defining the constraint--is available during both training and inference. Existing approaches generally fall into two categories: supervised models that treat the conditioning signal as a static cue and ignore alignment information at inference, and guidance-based methods that consult it through hand-tuned linear updates, typically trading fidelity to the condition against the plausibility of the generated sample. We argue that the fundamental gap in both paradigms is that the model is never trained to utilize its own alignment error. We introduce FlowBender, a closed-loop framework that treats this error as a first-class input, training the network to learn a correction policy conditioned on inference-time feedback. At each step, an unguided look-ahead pass estimates the clean signal, a task-specific deviation is computed via the forward operator, and a refinement pass consumes this signal to produce a corrected velocity. We propose several variants of FlowBender, including a gradient-based formulation for differentiable operators and a zero-order variant for non-differentiable settings such as JPEG compression. For efficient sampling, we introduce a prior-step shortcut that enables closed-loop correction at a minimal additional computational cost. Across image-to-image translation, restoration, and 3D mesh texturing, FlowBender consistently outperforms standard supervised baselines, alignment-loss-augmented training, and state-of-the-art inference-time guidance, improving fidelity and plausibility simultaneously rather than trading them against each other. Project page: https://flow-bender.github.io/