📄 论文解读

一个模型搞定所有导航任务，还能现场切换模式

趋势通道 ▲ 21 机器人导航多任务模型参数化接口零样本泛化

现在的机器人导航模型通常只能干一件事：要么跟着指令走，要么找物体，要么自动驾驶。但真实场景里，机器人需要随时切换任务——比如先找目标物体，再跟踪它，最后自主驾驶到目的地。这篇论文让一个模型能同时处理所有这些任务，而且不需要换模型或改代码。

研究者设计了一个参数化接口，把导航行为拆成两个可调维度：任务模式（决定当前要做什么）和观察参数（控制看多少、怎么看）。训练时随机组合这些参数，模型就学会了在任何配置下都能工作。更关键的是，他们用15.6M样本训练，并混入视觉-语言数据，防止模型变成只会机械反应的“动作序列映射器”。

实际效果：在多个导航基准上刷新了纪录，从2B参数扩展到8B参数时性能持续提升，而且零样本迁移到真实机器人上也能用。

这不是你明天就能用的技术，但它展示了一个重要趋势：未来的机器人可能不再需要为每个任务单独训练模型，而是用一个通用模型加上动态配置来应对所有场景。

📄 原文摘要(英文)

Agentic navigation systems require a base navigation model whose observation strategy can be externally reconfigured at inference time, because instruction following, object search, target tracking, and autonomous driving share the same perception-planning backbone yet demand fundamentally different strategies for consuming the visual stream. We present Qwen-RobotNav, a scalable navigation model built on Qwen-RobotNav that addresses it through a parameterised interface with two complementary dimensions: multiple task modes that select the navigation behaviour, and controllable observation parameters (e.g., token budget, per-camera weights) that govern how visual history is encoded. With training-time randomization over all parameters, Qwen-RobotNav is robust to any inference-time configuration requiring zero architectural modification to the Qwen-RobotNav backbone. We train Qwen-RobotNav on 15.6M samples; co-training with vision-language data prevents the collapse into reactive action-sequence mappers observed in trajectory-only training. The parameterised interface also makes Qwen-RobotNav a natural building block for agentic systems: for long-horizon scenarios, an upper-level planner decomposes goals into sub-tasks and dynamically switches Qwen-RobotNav's task mode and context strategy mid-episode, composing complex behaviours from repeated calls to the same model. Extensive experiments show that Qwen-RobotNav sets new state-of-the-art results across major navigation benchmarks. The model exhibits favourable scaling from 2B to 8B parameters, with joint multi-task training developing a shared spatial-planning substrate that transfers across task families, and demonstrates strong zero-shot generalisation to real-world robots across diverse environments.

arXiv 原文

📬 订阅 AI Pulse