📄 论文解读

让AI闭嘴：Whisper幻觉检测与修复

信赖通道 ▲ 11 Whisper幻觉检测稀疏自编码器内部表征语音识别

Whisper语音识别模型有个毛病：没声音时它也会“脑补”出连贯的文本，这叫幻觉。研究者发现，通过分析模型内部神经元的激活模式，可以检测到它是否在“瞎编”。他们用了一种叫稀疏自编码器的技术，把模型内部信号拆解成稀疏特征，然后像方向盘一样微调这些特征。结果：在无语音音频上，Whisper small的幻觉率从72.63%降到14.11%，large-v3从86.88%降到27.33%，而对正常语音的影响很小。这方法不需要重新训练模型，接近微调的效果。它不是你明天能用上的，但展示了如何用模型内部信号来纠正自身错误——一种更透明的AI修复思路。

📄 原文摘要(英文)

Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper's internal representations. We extract audio encoder activations and evaluate two representation spaces: raw Whisper activations and Sparse AutoEncoder (SAE) latents. We show that both spaces encode linearly separable hallucination-related information, with discriminative power concentrated in a sparse feature subset and increasing toward deeper encoder layers. We propose two steering strategies: activation-space steering and SAE latent-space steering. SAE-based steering reduces hallucination rate from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3 on the full non-speech test set, with small WER degradation on speech data, approaching the performance of fine-tuning-based methods.

arXiv 原文

📬 订阅 AI Pulse