AI Pulse
📄 论文解读

0.2B参数图像修复模型,性能媲美10B级

图像修复模型通常越做越大,但这篇论文反其道而行:用0.22B参数(不到2%)实现了与11.9B参数模型相当甚至更好的修复质量,同时推理速度快15倍以上。核心是设计了一种新的局部-全局交互模块,将空间和语义信息压缩成固定大小的矩阵,大幅减少参数而不丢失关键信息。再配合自适应蒸馏策略,在潜在空间内对齐高维特征,避免昂贵的像素级解码。这不是你明天就能用的工具,但它证明:在特定任务上,小而专的模型可以击败大而全的通用模型,为实际部署提供了新思路。

📄 原文摘要(英文)

While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-λ Mix Interaction (LλMI) block. Comprising Local-λ and Interactive-λ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a >15times acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting. Project page at https://hustvl.github.io/Moebius.

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部