AI Pulse
📄 论文解读

快手开源模型能看1小时视频,还只用了3B参数

看1小时视频,AI通常得把每一帧都算一遍,算力烧不起。快手开源的Keye-VL-2.0用了一个巧招:它只挑关键帧看,中间跳过的部分靠稀疏注意力补上,就像你读长文章时扫读重点段落。结果是用30B总参数、实际只激活3B的MoE架构,就能无损处理25.6万token的上下文——相当于一部电影的长度。更关键的是,它没因为偷懒而漏掉细节:在TimeLens这类需要精确到秒的定位任务上,它反而比同类模型更准。这不是你明天能直接用的工具,但它证明了一件事:长视频理解不一定靠堆算力,靠的是怎么聪明地跳过。

📄 原文摘要(英文)

We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, information redundancy, and prohibitive computational costs inherent in hour-level videos, Keye-VL-2.0 is the first to adapt DeepSeek Sparse Attention (DSA) to GQA-based multimodal architectures, enabling lossless 256K context processing while capturing critical frames and long-range temporal dependencies. This architecture is underpinned by a highly optimized training and inference infrastructure, including scalable video I/O, heterogeneous ViT-LM parallelism, and custom DSA kernels that significantly maximize throughput and minimize computational overhead. Furthermore, to overcome the algorithmic dilemma of catastrophic forgetting during multi-task alignment, we introduce Cross-Modal Multi-Teacher On-Policy Distillation (MOPD) paired with Context-RL and Video-RL. By distilling dense token-level teacher feedback from on-policy rollouts back into the MoE backbone, which activates only 3B parameters, Keye-VL-2.0 natively empowers advanced agent collaboration across Code, Tool, and Search scenarios with multimodal self-correction. Extensive evaluations across video understanding, temporal grounding, reasoning, STEM, and agent benchmarks demonstrate that Keye-VL-2.0-30B-A3B achieves state-of-the-art performance among models of similar scale, particularly excelling in fine-grained temporal localization on TimeLens and long-video comprehension on Video-MME-v2 and LongVideoBench. We release our model checkpoints to accelerate community progress toward scalable and robust multimodal agentic applications.

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部