CheXpert chest radiograph dataset (subset used in Zhang et al., Radiology: AI 2024)
dataset2025-11-30https://doi.org/10.1148/atlas.1764531879152
135

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

CheXpert chest radiograph dataset (subset used in Zhang et al., Radiology: AI 2024)

Link

https://stanfordmlgroup.github.io/competitions/chexpert

Indexing

Keywords: CheXpert, chest radiograph, frontal projection, saliency maps, multilabel classification
Content: CH
RadLex: RID10345, RID12711, RID28786, RID10626
SNOMED: 8186001, 60046008, 46621007, 19242006

Author(s)

Jeremy Irvin
Pranav Rajpurkar
Michael Ko
Matthew P. Lungren
O. Marques

Organization(s)

Stanford Machine Learning Group
Stanford University

Comments

Study used only frontal chest radiographs from CheXpert for multilabel classification of five observations (atelectasis, cardiomegaly, consolidation, edema, pleural effusion).

Date

Published: 2019

References

[1] Irvin J, Rajpurkar P, Ko M, et al.. "CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison". Proc AAAI Conf Artif Intell. 2019.
[2] Garbin C, Rajpurkar P, Irvin J, Lungren MP, Marques O. "Structured dataset documentation: a datasheet for CheXpert". arXiv:2105.03020. 2021. Available from: https://arxiv.org/abs/2105.03020

Dataset

Motivation

Used to evaluate sensitivity and robustness of saliency methods via prediction-saliency correlation in chest radiograph classification.

Sampling

Only frontal chest radiographs were included from CheXpert.

Partitioning scheme

From the original CheXpert, 191,027 frontal images from the original training set were randomly split 6:1 into training and validation; 202 frontal images from the original validation set were used as test.

Confidentiality

Public, de-identified dataset (as stated in study)

Sensitive data

None reported; dataset is fully de-identified public data.