CheXpert chest radiograph dataset sample
dataset2025-12-03https://doi.org/10.1148/atlas.1764793313757
164

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

CheXpert chest radiograph dataset sample

Link

https://pmc.ncbi.nlm.nih.gov/articles/PMC10698597

Indexing

Keywords: CheXpert, chest radiography, bias, sex, race, no finding, pleural effusion, cardiomegaly, pneumothorax
Content: CH
RadLex: RID10345, RID36008, RID5352, RID34539

Author(s)

Irvin J
Rajpurkar P
Ko M

Ethical review

Retrospective analysis based on publicly available secondary data; HIPAA-compliant; exempt from ethical approval per the study.

Comments

Publicly available CheXpert dataset used; study sample and data splits identical to Gichoya et al. Collection period October 2002 to July 2017.

Date

Published: 2019

References

[1] Irvin J, Rajpurkar P, Ko M, et al.. "CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison". Proc AAAI Conf Artif Intell. 2019.
[2] Glocker B, Jones C, Roschewitz M, Winzeck S. "Risk of Bias in Chest Radiography Deep Learning Foundation Models". Radiology: Artificial Intelligence. 2023-09-27. doi:10.1148/ryai.230060. PMID: 38074789. PMCID: PMC10698597.

Dataset

Motivation

Used to analyze bias and subgroup performance disparities in chest radiography foundation model features and downstream disease detection.

Sampling

Random sampling of 3000 patients (1000 from each racial group) used for certain analyses; test set resampling used to balance subgroups for unbiased performance estimates.

Partitioning scheme

Training/validation/testing splits identical to those used in Gichoya et al.; resampling with replacement used for balanced subgroup evaluation in testing.

Missing information

Per this paper, detailed subgroup counts by split are not provided here; see referenced sources for full breakdown.

Relationships between instances

Multiple radiographs per patient are present; authors also repeated analyses with one scan per patient to rule out within-patient cluster effects.

External data

Study compares models including a chest radiography foundation model; dataset itself is CheXpert.

Confidentiality

Publicly available dataset

Sensitive data

Demographic attributes (biologic sex and race) used for subgroup analyses.