表格AI只在简单任务上强,复杂场景还得靠老方法
表格数据(如Excel表格)的AI模型最近很火,但研究者发现:这些新模型只在数据量小、分布简单的任务上表现好。一旦遇到数据量大、维度高、或随时间变化的数据,传统树模型和深度学习反而更强。他们测试了11个模型和142个数据集,结论是:别被宣传迷惑,新模型还没解决真正难的问题。这不是你明天能用的结论,但如果你在选表格AI工具,知道它擅长什么、不擅长什么,比盲目追新更重要。
📄 原文摘要(英文)
Foundation models for predictive machine learning on tabular data have recently gained significant traction in academia and industry. Research communities across disciplines are increasingly evaluating tabular foundation models on diverse datasets and tasks. However, these task- and discipline-specific evaluations remain largely inaccessible to model researchers because benchmark software and evaluation protocols are fragmented. As a result, model researchers rely on standard benchmarks, which are mostly defined for tasks where tabular foundation models already excel. The most challenging scenarios are excluded, limiting meaningful progress in the field by focusing on marginal improvements on IID data rather than on broader, more demanding challenges. To overcome this, we introduce BeyondArena, the first unified holistic benchmark for tabular data that supports diverse task types (IID, temporal, grouped), across sample size and feature dimensionality scales, with diverse feature types (with text, with high cardinality) from a broad range of disciplines. To enable unified benchmarking beyond standard benchmarks, we introduce Data Foundry, a Python framework and metadata schema for curating tabular datasets for predictive machine learning. Our results across 11 models and 142 curated datasets show that existing tabular foundation models excel on tiny- to medium-sized IID data, while traditional tree-based and deep learning models still dominate on non-IID, large, and high-dimensional datasets. BeyondArena guides model research for the most demanding challenges in tabular data, enabling progress towards truly foundational tabular models.