📄 论文解读

0.2B参数图像修复模型，性能媲美10B级大模型

实用通道 ▲ 115 图像修复轻量化模型扩散模型知识蒸馏高效推理

图像修复领域有个怪现象：模型越大效果越好，但大到10B参数（约120亿）时，普通设备根本跑不动。这篇论文反其道而行，用0.22B参数（2.2亿）做出了同等甚至更好的修复效果，推理速度还快了15倍以上。他们不是简单压缩，而是重新设计了扩散模型的核心模块——用两个小模块分别处理局部细节和全局语义，把复杂信息压缩成固定大小的矩阵，既保留关键交互又大幅减少参数。同时配合一种只在潜在空间（而非像素空间）进行的多粒度蒸馏策略，让这个小模型学到大模型的精髓。结果就是：你可以在普通显卡上获得顶级修复质量，而不用烧钱买高端GPU。做图像编辑、内容创作、老照片修复的人，这是你明天就能用上的那种——开源代码已发布。

📄 原文摘要(英文)

While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-λ Mix Interaction (LλMI) block. Comprising Local-λ and Interactive-λ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a >15times acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting. Project page at https://hustvl.github.io/Moebius.

arXiv 原文

订阅 AI Pulse