AI画图提速25倍,不训练不换硬件
现在的AI画图工具(如FLUX、Midjourney)生成一张高清图往往要等十几秒甚至半分钟。这篇论文的MrFlow方法把生成过程拆成两步:先用低分辨率快速画出主体轮廓(这一步计算量只有高分辨率的1/10),然后用一个轻量级的超分模型把图放大到高清,最后加一点噪声让细节更自然。整个过程不需要重新训练模型,也不依赖特定硬件,在FLUX.1-dev上实现了10倍加速,画质几乎不变;如果配合已有的蒸馏技术,最高能到25倍。它不是你明天就能在手机上用的,但方向很明确:未来AI画图可以像看视频一样实时生成。
📄 原文摘要(英文)
Hardware-agnostic strategies for accelerating text-to-image diffusion, such as timestep distillation and feature caching, can reduce inference time without custom kernels or system-level optimization. Among them, multi-resolution generation strategies have recently received broad attention, attaining more than 5x speedup without any training. However, the design of performing upsampling in the latent space, together with the selective modification of partial regions, causes these methods to exhibit noticeable blurring or artifacts. To this end, we propose MrFlow, a training-free multi-resolution acceleration strategy for pretrained flow-matching models built upon a staged low-to-high-resolution pipeline. MrFlow first rapidly generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained GAN-based model, subsequently injects low-strength noise to enable high-frequency resampling, and finally refines the details at high resolution. Quantitative and qualitative results on FLUX.1-dev and Qwen-Image show that MrFlow exploits the quadratic token reduction and reduced step requirement of low-resolution sampling to achieve 10x end-to-end acceleration while keeping OneIG within a 1% gap relative to that before acceleration, significantly surpassing other training-free acceleration strategies, and requiring no training or runtime dynamic identification whatsoever. MrFlow can further be directly combined orthogonally with pre-trained timestep distillation strategies, achieving even higher generation acceleration of up to 25x.