CheXpert chest radiograph dataset (subset used in Zhang et al., Radiology: AI 2024)
2025-11-30https://doi.org/10.1148/atlas.1764531879152
135
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
CheXpert chest radiograph dataset (subset used in Zhang et al., Radiology: AI 2024)
Link
https://stanfordmlgroup.github.io/competitions/chexpert
Indexing
Keywords: CheXpert, chest radiograph, frontal projection, saliency maps, multilabel classification
Content: CH
RadLex: RID10345, RID12711, RID28786, RID10626
SNOMED: 8186001, 60046008, 46621007, 19242006
Author(s)
Jeremy Irvin
Pranav Rajpurkar
Michael Ko
Matthew P. Lungren
O. Marques
Organization(s)
Stanford Machine Learning Group
Stanford University
Comments
Study used only frontal chest radiographs from CheXpert for multilabel classification of five observations (atelectasis, cardiomegaly, consolidation, edema, pleural effusion).
Date
Published: 2019
References
[1] Irvin J, Rajpurkar P, Ko M, et al.. "CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison". Proc AAAI Conf Artif Intell. 2019.
[2] Garbin C, Rajpurkar P, Irvin J, Lungren MP, Marques O. "Structured dataset documentation: a datasheet for CheXpert". arXiv:2105.03020. 2021. Available from: https://arxiv.org/abs/2105.03020
Dataset
Motivation
Used to evaluate sensitivity and robustness of saliency methods via prediction-saliency correlation in chest radiograph classification.
Sampling
Only frontal chest radiographs were included from CheXpert.
Partitioning scheme
From the original CheXpert, 191,027 frontal images from the original training set were randomly split 6:1 into training and validation; 202 frontal images from the original validation set were used as test.
Confidentiality
Public, de-identified dataset (as stated in study)
Sensitive data
None reported; dataset is fully de-identified public data.