AI记忆体:能记住,但管不住秘密
你让AI助手记住你的病历、同事的日程、孩子的作业,它确实记住了——但可能记混了。现有AI记忆测试只测单人场景,而医院、公司、家庭里多人共用同一个AI记忆库时,问题就来了:谁该看到什么?谁要求删除后真的被忘了?这篇论文造了一个新测试,覆盖医疗、办公、教育、家庭四个场景,让AI同时面对长期记忆、权限控制、主动遗忘三个任务。结果:没有一种方法能同时做好。长上下文提示词效果最好但贵得离谱;检索式和外挂记忆省钱,但会泄露不该说的信息。结论很诚实:现在的AI记忆体,离安全共享还差得远。
📄 原文摘要(英文)
Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.