给机器人换环境,只需一个演示
机器人学会一个任务后,换个摄像头角度或换台长得像的机器人,往往就失灵了。以前要重新教它,得录很多次演示。这篇论文发现,其实可以像做向量加法一样,把「环境差异」单独提取出来,加到模型参数上。他们用了一种子空间对齐的方法,只靠一次演示就能让机器人适应新环境。在模拟和真实实验中,效果比现有方法都好。它不是你明天就能用上的,但方向很明确:以后机器人换场景,可能就像换滤镜一样简单。
📄 原文摘要(英文)
Vision-Language-Action (VLA) models often fail to perform the same learned tasks under environmental shifts, such as changes in camera pose and shifts to a different but similar robot (e.g., from Panda to UR5e). Adapting these models to the shifted environment (i.e., target domain) often requires training on multiple demonstrations for each task, which are costly to collect. To reduce the burden of data curation and training, we propose an analogy-based method that adapts VLA models under environmental shifts through weight vector arithmetic with domain-specific information addition, named Domain ARiThmetic (DART). Unlike prior approaches, DART requires collecting only a single demonstration, enabling efficient adaptation. To accurately isolate domain-specific information for addition, DART performs subspace alignment between singular components in weight vectors to filter out noisy components. In both simulated and real-world experiments, DART outperforms existing VLA adaptation methods in one-shot scenarios across diverse visual and embodiment shifts. Code is available at https://github.com/snumprlab/dart.