AI Pulse
📄 论文解读

AI调解员:只解决了三分之一的冲突

AI当调解员,听起来很理想——冷静、中立、不累。但这篇论文告诉你,现实很骨感:最强的AI调解员,在真实冲突场景下,也只能缩小约三分之一的意见分歧。研究者建了一个新测试集SoCRATES,从真实冲突中取材,覆盖8个领域,还模拟了情绪、文化、策略等5种人际变量。结果发现,AI的表现随场景剧烈波动,比如面对不同文化背景或情绪激烈的双方,调解效果大打折扣。它不是你明天能用上的工具,但提醒你:AI离真正理解人的复杂心理,还有很长的路。

📄 原文摘要(英文)

Evaluating LLM mediators remains challenging, as mediation unfolds as a real-time trajectory shaped by disputants' shifting emotions, intentions, and context. Existing testbeds rely on a few expert-authored domains, vary mainly strategic posture, and score every turn against every topic, introducing off-topic noise. We introduce SoCRATES, a benchmark for evaluating proactive LLM mediators in realistic, multi-domain testbeds. It constructs scenarios from real conflicts through an agentic pipeline across eight domains, probes five socio-cognitive adaptation axes (strategic posture, party composition, history length, emotional reactivity, and cultural identity), and scores each topic only on the turns that advance it via a topic-localized evaluator. The evaluator reaches 0.82 alignment with human experts, more than doubling a per-turn baseline. Benchmarking eight frontier LLMs, we find that even the strongest mediator closes only about a third of the unmediated consensus gap under diverse and realistic testbeds, with performance varying sharply by socio-cognitive axis, highlighting that progress lies in social adaptation to diverse conditions.

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部