把技能塞进模型权重,不再每步都念一遍
现在的AI智能体(比如帮你操作网页、回答问题的助手)通常会把技能写成文字,每次执行时都塞进提示词里。这就像每次干活前都要把说明书从头到尾读一遍,既浪费上下文空间,也暴露了技能细节。这篇论文提出一个办法:用一个预训练的超网络,把文字技能直接转成LoRA适配器(一种轻量级模型补丁),存进模型的权重里。这样,执行时就不用再带文字技能,节省了64%到72%的输入token,而且在ALFWorld和Search-QA两个任务上,成功率还分别提升了21.4和3.0个百分点。更妙的是,这些技能补丁可以像乐高一样组合、缩放,甚至通过参数加减实现技能融合。它不是你明天就能直接用的工具,但指向一个方向:未来的AI可能不再需要每次重复指令,而是像人一样,学会的东西就长在脑子里。
📄 原文摘要(英文)
Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content as plaintext. We present LatentSkill, a framework that converts textual skills into plug-and-play LoRA adapters through a pretrained hypernetwork. LatentSkill stores skill knowledge in weight space rather than context space, removing per-step skill tokens while preserving modular loading, scaling, and composition. On ALFWorld and Search-QA, LatentSkill outperforms the corresponding in-context skill baseline while using substantially fewer prefill tokens: it improves ALFWorld success by 21.4 and 13.4 points on the seen and unseen splits with 64.1% fewer prefill tokens, and improves Search-QA exact match by 3.0 points with 72.2% lower skill-token overhead. Further analysis shows that generated skill LoRAs form a structured semantic geometry, can be precisely controlled via the LoRA scaling coefficient, and can be composed through parameter-space arithmetic when skill components are aligned. These findings suggest that weight-space skills provide an efficient, modular, and less exposed substrate for extending LLM agents.