📄 论文解读

把物体P进照片，还能随意调角度

实用通道 ▲ 23 物体插入3D姿态控制图像合成扩散模型

现在的AI「把一个物体放进照片里」其实只是贴了张2D图，你没法控制它朝哪。这篇让你能像摆真东西一样调：把物体的长相、你拖出来的3D朝向、背景的光影拆成三路分别喂、互不打架，插进去既保住原样、又听你摆的角度、还融进场景，角度可控和真实度都比以前强。做电商产品图、室内效果图、想把东西P进照片不穿帮的人，这是你明天就用得上的那种。

📄 原文摘要(英文)

Object insertion aims to seamlessly composite a reference object into a specified region of a background image. Recent diffusion-based methods achieve high visual quality but formulate insertion as a simple 2D inpainting task, providing no explicit control over the object's 3D pose and limiting their practical applicability. We propose DIRECT (Decomposed Injection for Reference Composition and Target-integration), a novel framework that integrates interactive pose manipulation with high-fidelity 2D image synthesis to enable pose-controllable object insertion. Our method decomposes the insertion conditions into three complementary components: appearance guidance capturing visual details from the reference object, geometry guidance derived from the user-adjusted 3D proxy, and context guidance from the target background. By injecting them through separate pathways, DIRECT avoids feature entanglement and simultaneously preserves reference appearance, follows the user-specified pose, and adapts the object to the target scene. We also introduce an automated data construction pipeline to improve the diversity and quality of training data. Experiments show that DIRECT outperforms previous methods in both geometric controllability and visual quality.

arXiv 原文

📬 订阅 AI Pulse