AI Pulse
📄 论文解读

xLSTM 击败 Mamba:序列建模新王者

Transformer 的注意力机制虽强,但计算量随序列长度平方增长,成了瓶颈。研究者对比了三种替代架构:xLSTM、Mamba-2 和 Gated DeltaNet,在代码模型预训练、大模型蒸馏和时间序列基础模型上测试。结果 xLSTM 全面领先,原因在于其门控机制能更灵活稳定地修正记忆状态,实现更好的状态追踪。这不是你明天能直接用的工具,但如果你关注 AI 模型效率,xLSTM 是值得关注的方向。

📄 原文摘要(英文)

Transformers dominate modern sequence modeling, but their quadratic attention incurs substantial computational cost. Subquadratic architectures offer a scalable alternative. However, it remains unclear which designs yield the most effective sequence models. We compare three leading approaches: xLSTM, Mamba-2, and Gated DeltaNet. We evaluate these models on tasks with complex dependencies: (1) code-model pre-training, (2) distillation of code models from large language models, and (3) pre-training of time-series foundation models. Across these settings, xLSTM delivers the strongest overall performance. To explain xLSTM's advantage, we present a unified formulation and analyze the underlying architectural mechanisms, focusing on state tracking and memory dynamics. Our results show that xLSTM enables more flexible and stable memory correction via its gating scheme. We corroborate these findings on controlled synthetic length-generalization tasks. Overall, our findings indicate that xLSTM's gains on complex tasks stem from robust state tracking and accumulation.

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部