📄 论文解读

AI医生不再非黑即白：用辩论式推理给出连续风险评分

信赖通道 ▲ 27 医疗AI风险预警可解释性大语言模型校准

现在的AI预警系统在急诊场景下有个致命问题：它要么说“有风险”，要么说“没风险”，像个二极管。但真实病情是连续的——从轻微异常到危急值之间有一整段灰色地带。这篇论文让AI学会“自己跟自己辩论”：针对“患者会恶化”和“不会恶化”两种可能，分别列出证据，再综合出最终风险分数。在三个医疗数据集上，校准误差降低了81%，意味着AI不再盲目自信或过度保守。更重要的是，它给出的推理过程（比如“因为血氧下降但心率稳定，所以风险中等”）比传统事后解释方法在临床合理性上高出20%。虽然你明天用不上，但它指向一个方向：AI在医疗等高风险领域，必须学会承认不确定性，并用可验证的逻辑表达出来。

📄 原文摘要(英文)

Clinical early warning systems built on electronic health records, in which clinical observations are recorded as irregularly sampled medical time series (ISMTS), must deliver both calibrated risk scores for patient triage and interpretable rationales that clinicians can verify. Large Language Models (LLMs) have been explored for this task, yet they collapse graded clinical risk into overconfident binary predictions. This risk polarization undermines both calibration and cross-patient comparability. To address this, we propose TRIAGE, a framework that trains an LLM to generate dialectical reasoning over competing clinical outcomes by eliciting outcome-specific rationales. This dialectical formulation mitigates risk polarization, enabling a single LLM to yield continuous risk scores grounded in explicit clinical reasoning. Evaluated on three ISMTS benchmarks, TRIAGE achieves an average AUPRC improvement of 3.3% and reduces calibration error by 81% compared to the competitive baselines. An LLM-as-a-judge assessment further shows that our rationales surpass post-hoc explanations from the baseline by 20% in clinical reasoning quality. The source code is available at https://github.com/HyeongWon-Jang/TRIAGE .

arXiv 原文

订阅 AI Pulse