AI Pulse
📄 论文解读

AI医生不再非黑即白:用辩论式推理给出连续风险评分

医生看病人数据时,最烦AI给出一个“有风险/没风险”的二元结论——这太武断,而且不同病人之间没法比较。这篇论文让AI学会“自己跟自己辩论”:针对同一个病人,先列出支持“高风险”的理由,再列出支持“低风险”的理由,最后综合出一个连续的风险分数(比如0.7)。在三个医疗数据集上,这种辩论式推理把校准误差降低了81%,也就是说AI不再盲目自信或过度保守,给出的分数更可信。同时,它还能把推理过程用自然语言写出来,医生可以检查AI到底看了哪些指标、怎么权衡的。虽然你明天用不上,但如果你关心医疗AI的可靠性,这是一个让AI从“黑箱判官”变成“可对话助手”的关键进展。

📄 原文摘要(英文)

Clinical early warning systems built on electronic health records, in which clinical observations are recorded as irregularly sampled medical time series (ISMTS), must deliver both calibrated risk scores for patient triage and interpretable rationales that clinicians can verify. Large Language Models (LLMs) have been explored for this task, yet they collapse graded clinical risk into overconfident binary predictions. This risk polarization undermines both calibration and cross-patient comparability. To address this, we propose TRIAGE, a framework that trains an LLM to generate dialectical reasoning over competing clinical outcomes by eliciting outcome-specific rationales. This dialectical formulation mitigates risk polarization, enabling a single LLM to yield continuous risk scores grounded in explicit clinical reasoning. Evaluated on three ISMTS benchmarks, TRIAGE achieves an average AUPRC improvement of 3.3% and reduces calibration error by 81% compared to the competitive baselines. An LLM-as-a-judge assessment further shows that our rationales surpass post-hoc explanations from the baseline by 20% in clinical reasoning quality. The source code is available at https://github.com/HyeongWon-Jang/TRIAGE .

arXiv 原文

📬 订阅 AI Pulse

每天三次更新,不错过重要信号

▲ 回到顶部