AI学技能,先学会说“不”
现在的AI助手在电脑上干活时,会从成功操作中“偷师”学新技能。但问题来了:如果环境里有恶意弹窗或指令注入,它可能把危险操作也学成“好技能”。这篇研究让AI在学技能时多一道安全边界——用多路信号判断哪些操作是安全的,只学那些,并且根据上下文只激活必要的技能子集。实验显示,不安全率降低了57.1%。它不是你明天能用上的,但点出了一个关键转向:AI不仅要学得快,还要学得“怂”。
📄 原文摘要(英文)
Computer-Use Agents (CUAs) are increasingly deployed in dynamic interactive environments, creating a growing need for continual skill learning during interaction. Recent approaches address this challenge by learning reusable skills from successful trajectories. However, these skill learning methods largely assume static and safe environments, overlooking risks from adversarial interactions (e.g., prompt injections) and environmental dynamics (e.g., pop-ups). In dynamic settings, such assumptions can lead to risky skill learning and brittle execution, undermining the reliability of CUAs. This raises the question: how can CUAs learn and use skills safely in dynamic environments? To address this problem, we propose SkillHarness, a framework for safe skill harnessing in dynamic environments. SkillHarness moves beyond static skill abstractions by modeling skill learning and utilization as a safety-constrained interaction process. Specifically, we introduce the skill boundary that leverages multi-source supervision signals to identify safe skills from interaction trajectories, and construct self-improving safety constraints throughout the skill lifecycle. In addition, SkillHarness introduces selective skill reuse, where tasks are guided to decompose according to context and completed through the selective activation of skill subsets. Our experiments demonstrate that SkillHarness significantly reduces the unsafe rate of learned skills by 57.1% and consistently improves execution stability under dynamic environmental changes, outperforming existing baselines.