AI看病不再非黑即白:用辩论式推理给出连续风险分
医生看病人数据时,需要的不只是“危险/安全”二选一,而是0到100的连续风险分。但现有AI模型往往过于自信,把轻微风险也判成高危,导致误判。这篇研究让AI像辩论一样,先为“病情恶化”和“病情稳定”分别找证据,再综合打分。结果风险分更连续、更准,校准误差降低81%,而且AI给出的理由比事后解释更靠谱。它不是你明天能用上的,但指明了AI医疗预警从“拍脑袋”走向“讲道理”的方向。
📄 原文摘要(英文)
Clinical early warning systems built on electronic health records, in which clinical observations are recorded as irregularly sampled medical time series (ISMTS), must deliver both calibrated risk scores for patient triage and interpretable rationales that clinicians can verify. Large Language Models (LLMs) have been explored for this task, yet they collapse graded clinical risk into overconfident binary predictions. This risk polarization undermines both calibration and cross-patient comparability. To address this, we propose TRIAGE, a framework that trains an LLM to generate dialectical reasoning over competing clinical outcomes by eliciting outcome-specific rationales. This dialectical formulation mitigates risk polarization, enabling a single LLM to yield continuous risk scores grounded in explicit clinical reasoning. Evaluated on three ISMTS benchmarks, TRIAGE achieves an average AUPRC improvement of 3.3% and reduces calibration error by 81% compared to the competitive baselines. An LLM-as-a-judge assessment further shows that our rationales surpass post-hoc explanations from the baseline by 20% in clinical reasoning quality. The source code is available at https://github.com/HyeongWon-Jang/TRIAGE .