AI Pulse
📄 论文解读

AI概念藏在角度里,不是长度里

我们一直以为AI的「想法」藏在神经元的激活强度里,但最新研究告诉你:关键信息其实藏在方向里。研究者把AI的隐藏状态拆成「角度」和「长度」两部分,发现控制AI行为时,真正起作用的是角度——就像指南针指北,长度只是信号强弱。他们用7个模型验证:只调角度就能精准改变AI的回答,而调长度只会让输出不稳定。这解释了为什么有些控制方法有效、有些却翻车。它不是你明天能用上的,但如果你好奇AI到底怎么「想」的,这篇给了你一个更干净的视角。

📄 原文摘要(英文)

Linear activation steering has gained popularity as a simple and empirically effective way to control language model behavior. More recently, spherical steering paradigms have been proposed to address limitations of additive interventions, often motivated by the assumption that hidden-state norm does not carry concept-relevant information. In this work, we revisit this assumption through a controlled empirical study designed to disentangle the roles of angular and radial components. We show that steering methods differ mainly in how they couple two geometric effects: changing a token's angular alignment with a concept direction and changing its hidden-state norm. Across seven language models, we find that concepts are represented primarily in angular structure, supporting the motivation for spherical methods, but that norm remains important for the stability and downstream effects of steering. Our results explain why interventions with similar concept-level effects can behave differently, and suggest that activation steering should be parameterized by interpretable angular and radial components of the intervention, rather than by a single additive coefficient that entangles these two effects.

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部