Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Damien Teney, Maxime Peyrard, Ehsan Abbasnejad ;


"Machine learning models are typically designed for maximum accuracy on validation data. This predictive criterion rarely captures all desirable properties, in particular how a model matches a domain expert’s \emph{understanding} of the task. In this situation, known as underspecification, two models with similar validation accuracy may rely on different features (e.g. shape or texture in image recognition) and make very different predictions on out-of-distribution (OOD) data. Identifying underspecification is important as a warning against unexpected behaviour of deployed models, and as an indication of the need for additional task-specific knowledge. In this paper, we formalize the notion of underspecification and propose a method to identify and address the issue. We train multiple models with an independence constraint that forces them to discover distinct predictive features, most of which are missed by standard training. The number of models trainable under this constraint characterize the degree of underspecification of a task. Moreover, we show that an optimal set of these features can be combined to obtain a global predictor with superior OOD performance. We demonstrate the method on existing benchmarks and discuss important implications of underspecification. In particular, in-domain validation performance cannot serve for OOD model selection without additional assumptions."

Related Material

[pdf] [supplementary material] [DOI]