📄 论文解读

让机器人像孩子一样自己玩着学技能

信赖通道 ▲ 29 机器人学习自主探索技能迁移Code-as-Policy

现在的机器人只能按指令干活，指令之外的能力为零。这篇让机器人自己玩：它自己给自己出题、试错、把成功经验存成技能库。以后遇到新任务，直接从库里翻出相关技能就能用，不用重新训练。在测试中，玩过的机器人比没玩过的任务成功率高出20个百分点，而且技能还能借给别的机器人用。它不是你明天能用上的，但方向很明确：机器人该像孩子一样，在玩中学会通用能力。

📄 原文摘要(英文)

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.

arXiv 原文

📬 订阅 AI Pulse