📄 论文解读

AI操作电脑：点鼠标不如写命令？

趋势通道 ▲ 25 AI代理图形界面命令行任务执行基准测试

我们总以为AI用图形界面（GUI）像人一样点鼠标更自然，但新研究告诉你：用命令行（CLI）写指令反而可能更靠谱。研究者设计了440个桌面任务，让AI在完全相同的目标、初始状态和验证标准下，分别用GUI和CLI完成。结果最强GUI代理通过率59.1%，而原始CLI代理只有48.2%——但一旦给CLI补充了缺失的技能指令，通过率飙到69.3%。这说明GUI的瓶颈在于长流程中稳定点击，CLI的瓶颈只是技能库不全，而非AI能力不行。它不是你明天能用上的，但揭示了AI操作电脑的未来方向：与其教AI像人一样笨拙地移动鼠标，不如给它一套完整的命令接口。

📄 原文摘要(英文)

Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a matched execution-layer benchmark of 440 desktop tasks across 18 applications and 12 workflow categories, where screen-only GUI agents and skill-mediated CLI agents receive identical goals, states, and final-state verifiers while being restricted to modality-native actions. In this controlled setting, the strongest GUI agent reaches a 59.1% full pass rate, outperforming the strongest original-skill CLI agent at 48.2%; however, verifier-guided skill augmentation raises CLI success to 69.3%, showing that much of the CLI deficit comes from incomplete skill coverage rather than model capability alone. These results suggest that GUI and CLI expose different execution bottlenecks: GUI agents are limited by reliable grounded interaction over long-horizon workflows, whereas CLI agents are limited by the coverage and scalability of their skill interfaces.

arXiv 原文

📬 订阅 AI Pulse