CheXpert chest radiograph dataset sample
2025-12-03https://doi.org/10.1148/atlas.1764793313757
164
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
CheXpert chest radiograph dataset sample
Link
https://pmc.ncbi.nlm.nih.gov/articles/PMC10698597
Indexing
Keywords: CheXpert, chest radiography, bias, sex, race, no finding, pleural effusion, cardiomegaly, pneumothorax
Content: CH
RadLex: RID10345, RID36008, RID5352, RID34539
Author(s)
Irvin J
Rajpurkar P
Ko M
Ethical review
Retrospective analysis based on publicly available secondary data; HIPAA-compliant; exempt from ethical approval per the study.
Comments
Publicly available CheXpert dataset used; study sample and data splits identical to Gichoya et al. Collection period October 2002 to July 2017.
Date
Published: 2019
References
[1] Irvin J, Rajpurkar P, Ko M, et al.. "CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison". Proc AAAI Conf Artif Intell. 2019.
[2] Glocker B, Jones C, Roschewitz M, Winzeck S. "Risk of Bias in Chest Radiography Deep Learning Foundation Models". Radiology: Artificial Intelligence. 2023-09-27. doi:10.1148/ryai.230060. PMID: 38074789. PMCID: PMC10698597.
Dataset
Motivation
Used to analyze bias and subgroup performance disparities in chest radiography foundation model features and downstream disease detection.
Sampling
Random sampling of 3000 patients (1000 from each racial group) used for certain analyses; test set resampling used to balance subgroups for unbiased performance estimates.
Partitioning scheme
Training/validation/testing splits identical to those used in Gichoya et al.; resampling with replacement used for balanced subgroup evaluation in testing.
Missing information
Per this paper, detailed subgroup counts by split are not provided here; see referenced sources for full breakdown.
Relationships between instances
Multiple radiographs per patient are present; authors also repeated analyses with one scan per patient to rule out within-patient cluster effects.
External data
Study compares models including a chest radiography foundation model; dataset itself is CheXpert.
Confidentiality
Publicly available dataset
Sensitive data
Demographic attributes (biologic sex and race) used for subgroup analyses.