📄 论文解读

AI学会在实验室里动手做实验了

前沿通道 ▲ 52 实验室自动化机器人操作视觉语言动作模型模拟训练

现在的AI能读论文、写代码、生成实验方案，但一到动手操作——比如倒试剂、调显微镜——就歇菜了。这篇论文让AI真正走进实验室：他们先造了一个模拟实验室（RoboGenesis），在里面生成海量操作数据，然后训练了一个叫LabVLA的模型。关键创新是两步训练法：先让AI学会“动作词汇”（比如“抓取”“倾倒”），再学精细控制。结果在模拟实验中，它的成功率碾压所有现有模型，而且面对没见过的场景也能适应。虽然离真正替代实验员还远，但这是AI从“纸上谈兵”到“动手干活”的关键一步。

📄 原文摘要(英文)

Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan protocols, yet the execution of those protocols at the bench still requires a human operator. Vision-Language-Action (VLA) models provide one possible interface between written protocols and robot execution, but existing policies are trained mostly on household and tabletop demonstrations and rarely encounter the instruments, transparent liquids, or fixed protocol workflows found in scientific laboratories. Closing this gap requires both laboratory-specific supervision and a unified learning framework that can accommodate the diverse robot embodiments used to execute experimental protocols. We therefore identify data and embodiment as central bottlenecks alongside model design. To address the data side, we build RoboGenesis, a simulation-based workflow and data engine that composes configured laboratory workflows from atomic skills, validates and filters rollouts, and exports structured demonstrations across supported robot profiles. On the policy side, we present LabVLA, trained with a two-stage recipe: FAST action token pretraining first makes the Qwen3-VL-4B-Instruct backbone action aware before any continuous control is learned, and flow matching posttraining then attaches a DiT action expert under knowledge insulation. On the LabUtopia benchmark, LabVLA achieves the highest average success rate among all evaluated baselines under both in-distribution and out-of-distribution settings.

arXiv 原文

📬 订阅 AI Pulse