📄 论文解读

让AI互相挑刺，比一个AI死磕强124%

信赖通道 ▲ 10 质量多样性搜索大模型进化计算异构集成对抗训练

让AI自己进化，通常是用同一个模型反复试错。这篇反其道而行：把四个不同的大模型（GPT、Claude的变体）组成一个“互相挑刺”的网络，每个模型负责生成新方案，然后互相分享最优解，形成跨模型的对抗压力。在编程对战游戏Core War中，这种异构组合比单个模型多花同样预算，效果提升124%，覆盖的解决方案类型也多出28%。关键不是算力堆砌，而是模型之间的“多样性”——不同AI的偏见反而成了互补优势。它不是你明天能用上的，但提示了一个方向：未来AI进化可能更像一个生态系统，而不是一个孤胆英雄。

📄 原文摘要(英文)

We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty. Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population. This creates cross-model adversarial pressure that drives robustness beyond intra-model self-play. Evaluated on the Core War domain, a competitive programming benchmark in which Redcode warrior programs battle inside a simulated machine, a four-node heterogeneous ensemble (GPT-5.4-mini, Claude Sonnet 4.6, GPT-5.2, and Claude Haiku 4.5) achieves 124 percent higher merged-archive QD-Score (45.90 vs. 20.46) and 28 percent higher coverage (80.6 percent vs. 63.0 percent of cells) than a single-node baseline at equal total LLM-call budget. The heterogeneous ensemble also outperforms an equally-budgeted homogeneous ensemble on QD-Score, coverage, and held-out solution generality across all four model families. These results provide the first empirical evidence that model diversity, not merely parallelism, is the key driver of gain in distributed LLM-based QD search.

arXiv 原文

📬 订阅 AI Pulse