A Dataset Generation Framework for Evaluating Megapixel Image Classifiers & Their Explanations
"Deep learning-based megapixel image classifiers have exceptional prediction performance in a number of domains, including clinical pathology. However, extracting reliable, human-interpretable model explanations has remained challenging. Because real-world megapixel images often contain latent image features highly correlated with image labels, it is difficult to distinguish correct explanations from incorrect ones. Furthering this issue are the flawed assumptions and designs of today’s classifiers. To investigate classification and explanation performance, we introduce a framework to (a) generate synthetic control images that reflect common properties of megapixel images and (b) evaluate average test-set correctness. By benchmarking two commonplace Convolutional Neural Networks (CNNs), we demonstrate how this interpretability evaluation framework can inform architecture selection beyond classification performance -- in particular, we show that a simple Attention-based architecture identifies salient objects in all seven scenarios, while a standard CNN fails to do so in six scenarios. This work carries widespread applicability to any megapixel imaging domain."