📄 论文解读

手机AI终于学会记事了：主动管理上下文，长任务不掉链子

趋势通道 ▲ 21 手机AI助手长任务上下文管理GUI代理记忆机制

现在的手机AI助手（比如让AI帮你订酒店、填表单）一旦任务超过几步就容易忘事——它把所有历史记录堆在一起，关键信息被淹没，最后乱套。这篇论文让AI学会主动管理记忆：像人一样，把重要的信息（比如填过的地址）折叠保存，不重要的丢掉，同时把当前步骤和之前的关键状态分开存放。他们在近3000条真实操作轨迹上训练了一个8B模型，在长任务测试中达到开源8B模型最佳，还能泛化到没见过的任务。这不是你明天就能用的功能，但它指明了方向：未来的手机AI助手会像靠谱的秘书一样，记得住你五分钟前说过的话。

📄 原文摘要(英文)

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.

arXiv 原文

📬 订阅 AI Pulse