📄 论文解读

AI调解员：只解决了三分之一的冲突

信赖通道 ▲ 44 AI调解冲突解决大模型评估社会认知基准测试

AI当调解员，听起来很理想——冷静、中立、不累。但这篇论文告诉你，现实很骨感：最强的AI调解员，在真实冲突场景下，也只能缩小约三分之一的意见分歧。研究者建了一个新测试集SoCRATES，从真实冲突中取材，覆盖8个领域，还模拟了情绪、文化、策略等5种人际变量。结果发现，AI的表现随场景剧烈波动，比如面对不同文化背景或情绪激烈的双方，调解效果大打折扣。它不是你明天能用上的工具，但提醒你：AI离真正理解人的复杂心理，还有很长的路。

📄 原文摘要(英文)

Evaluating LLM mediators remains challenging, as mediation unfolds as a real-time trajectory shaped by disputants' shifting emotions, intentions, and context. Existing testbeds rely on a few expert-authored domains, vary mainly strategic posture, and score every turn against every topic, introducing off-topic noise. We introduce SoCRATES, a benchmark for evaluating proactive LLM mediators in realistic, multi-domain testbeds. It constructs scenarios from real conflicts through an agentic pipeline across eight domains, probes five socio-cognitive adaptation axes (strategic posture, party composition, history length, emotional reactivity, and cultural identity), and scores each topic only on the turns that advance it via a topic-localized evaluator. The evaluator reaches 0.82 alignment with human experts, more than doubling a per-turn baseline. Benchmarking eight frontier LLMs, we find that even the strongest mediator closes only about a third of the unmediated consensus gap under diverse and realistic testbeds, with performance varying sharply by socio-cognitive axis, highlighting that progress lies in social adaptation to diverse conditions.

arXiv 原文

📬 订阅 AI Pulse