📄 论文解读

视频编辑实时化：逐帧改，背景不崩

趋势通道 ▲ 23 视频编辑实时流式处理蒸馏AR

现在的AI视频编辑要么慢到没法实时互动，要么改几帧后背景就开始闪烁、物体变形。这篇论文搞了个新框架，核心是三步蒸馏：先让一个强大的双向模型学会怎么改视频（但很慢），再把它教给一个单向的流式模型，后者能一帧一帧地边看边改，同时用缓存机制复用上一帧的计算结果，最终在保持画面稳定的前提下把速度拉到12.66帧/秒——够直播或AR里用了。它不是你明天就能用的工具，但指明了实时视频编辑从“离线渲染”走向“边拍边改”的技术路径。

📄 原文摘要(英文)

Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to the strict preservation requirement and region-specific control. In this work, we present a novel streaming video editing framework that performs causal, frame-by-frame editing with strong content preservation and real-time responsiveness. Our key design is a three-stage distillation pipeline that progressively transfers editing capability from a powerful bidirectional foundation model to an efficient unidirectional streaming editor, enabling stable long-horizon edits without sacrificing visual fidelity. To further support real-time deployment, we introduce an AR-oriented mask cache that reuses region-related computation across frames, substantially reducing redundant processing and accelerating inference. Finally, we establish a dedicated benchmark for streaming video editing. Extensive evaluations demonstrate that our method achieves state-of-the-art visual quality among streaming baselines while drastically boosting inference speed to 12.66 FPS, making it suitable for interactive and augmented reality applications.

arXiv 原文

📬 订阅 AI Pulse