Gregory Holste, Song Wang, Ziyu Jiang, Thomas C. Shen, Ronald M. Summers, Yifan Peng, Zhangyang Wang
[Oral Presentation] MICCAI Workshop on Data Augmentation, Labelling, and Imperfections (DALI). 2022.
[Paper] | [arXiv] | [Oral Presentation]
Imaging exams, such as chest radiography, will yield a small set of common findings and a much larger set of uncommon findings. While a trained radiologist can learn the visual presentation of rare conditions by studying a few representative examples, teaching a machine to learn from such a “long-tailed” distribution is much more difficult, as standard methods would be easily biased toward the most frequent classes. In this paper, we present a comprehensive benchmark study of the long-tailed learning problem in the specific domain of thorax diseases on chest X-rays. We focus on learning from naturally distributed chest X-ray data, optimizing classification accuracy over not only the common “head" classes, but also the rare yet critical “tail” classes. To accomplish this, we introduce a challenging new long-tailed chest X-ray benchmark to facilitate research on developing long-tailed learning methods for medical image classification. The benchmark consists of two chest X-ray datasets for 19- and 20-way thorax disease classification, containing classes with as many as 53,000 and as few as 7 labeled training images. We evaluate both standard and state-of-the-art long-tailed learning methods on this new benchmark, analyzing which aspects of these methods are most beneficial for long-tailed medical image classification and summarizing insights for future algorithm design. The datasets, trained models, and code are available at https://github.com/VITA-Group/LongTailCXR.
All trained model weights are available below. In the following table, best results are bolded and second-best results are underlined. See paper for full results (bAcc = balanced accuracy).
Method | NIH-CXR-LT bAcc | MIMIC-CXR-LT bAcc | NIH-CXR-LT Weights | MIMIC-CXR-LT Weights |
---|---|---|---|---|
Softmax | 0.115 | 0.169 | link | link |
CB Softmax | 0.269 | 0.227 | link | link |
RW Softmax | 0.260 | 0.211 | link | link |
Focal Loss | 0.122 | 0.172 | link | link |
CB Focal Loss | 0.232 | 0.191 | link | link |
RW Focal Loss | 0.197 | 0.239 | link | link |
LDAM | 0.178 | 0.165 | link | link |
CB LDAM | 0.235 | 0.225 | link | link |
CB LDAM-DRW | 0.281 | 0.267 | link | link |
RW LDAM | 0.279 | 0.243 | link | link |
RW LDAM-DRW | 0.289 | 0.275 | link | link |
MixUp | 0.118 | 0.176 | link | link |
Balanced-MixUp | 0.155 | 0.168 | link | link |
Decoupling (cRT) | 0.294 | 0.296 | link | link |
Decoupling (tau-norm) | 0.214 | 0.230 | -- | -- |
Labels for the MIMIC-CXR-LT dataset presented in this paper can be found in the labels/
directory. Labels for NIH-CXR-LT can be found at https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/174256157515. For both datasets, there is one csv file for each data split ("train", "balanced-val", "test", and "balanced-test").
To reproduce the results presented in this paper...
- Register to download the MIMIC-CXR dataset from https://physionet.org/content/mimic-cxr/2.0.0/, and download the NIH ChestXRay14 dataset from https://nihcc.app.box.com/v/ChestXray-NIHCC/.
- Install prerequisite packages with Anaconda:
conda env create -f lt_cxr.yml
andconda activate lt_cxr
. - Run all MIMIC-CXR-LT experiments:
bash run_mimic-cxr-lt_experiments.sh
(first changing the--data_dir
argument to your MIMIC-CXR path). - Run all NIH-CXR-LT experiments:
bash run_nih-cxr-lt_experiments.sh
(first changing the--data_dir
argument to your NIH ChestXRay14 path).
@inproceedings{holste2022long,
title={Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study},
author={Holste, Gregory and Wang, Song and Jiang, Ziyu and Shen, Thomas C and Shih, George and Summers, Ronald M and Peng, Yifan and Wang, Zhangyang},
booktitle={MICCAI Workshop on Data Augmentation, Labelling, and Imperfections},
pages={22--32},
year={2022},
organization={Springer}
}
Feel free to contact me (Greg Holste) at [email protected] with any questions!