AI学会自己出题考自己
现在的视觉AI只会被动回答问题,但这篇让它学会主动提问——而且是自己出题、自己判分、自己迭代。研究者让一个视觉语言模型同时扮演“出题人”和“阅卷人”:先自己生成一堆关于图片的问题,再挑出那些更难、更依赖视觉细节、信息量更大的,然后用这些题反过来训练自己。结果,它出的题越来越刁钻,甚至比用人工标注数据训练的效果还好。更反直觉的是:这个“自问自答”的AI,在回答别人问题时的能力也没有下降,反而更强了。它不是你明天能用上的,但它展示了一种可能:AI可以像人一样,通过“自己考自己”来进步,不再依赖外部数据。
📄 原文摘要(英文)
Vision-language models (VLMs) are typically trained as passive answerers, while their ability to actively ask diverse, non-trivial, visual-centric and grounded questions remains underexplored. Existing visual questioners' performance is bottlenecked by the availability of high-quality training data or the cost of curating them. We show that a VLM can continuously improve itself as a visual questioner without any external supervision. We propose a self-evolving framework that uses a VLM itself as both a proposer and a filter to produce harder, more informative, and visual-centric questions, while maintaining their exploration diversity to avoid training collapse. These questions are then used to train the VLM in both questioner and answerer modes. To evaluate the questioner, we introduce an agentic protocol that assesses questions along perception, reasoning, and diversity dimensions. Experiments across various backbone VLMs show that our method substantially enhances the quality and substantially expands the difficulty boundary of autonomous question generation. Under the same budget, our self-supervision is more effective than training on the static source data. Moreover, the self-evolving questioner remains a competitive or even better answerer.