Trillium Health Partners Adult Posteroanterior Chest Radiograph External Test Set (2016–2020)
2025-12-06https://doi.org/10.1148/atlas.1765032217785
143
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
Trillium Health Partners Adult Posteroanterior Chest Radiograph External Test Set (2016–2020)
Link
https://pubmed.ncbi.nlm.nih.gov/37795140/
Indexing
Keywords: chest x-ray, posteroanterior, external validation, natural language processing labels, subgroup analysis, generalization gap, bias, emergency department, ICU, support devices
Content: CH, IN, RS
RadLex: RID10345, RID5557, RID28625, RID45946
SNOMED: 233604007, 36118008, 8186001, 33737001, 60046008
Author(s)
Monish Ahluwalia
Mohamed Abdalla
James Sanayei
Laleh Seyyed-Kalantari
Mohannad Hussain
Amna Ali
Benjamin Fine
Organization(s)
Trillium Health Partners
University of Toronto
Vector Institute for Artificial Intelligence
Queen’s University (Kingston Health Sciences Centre)
York University (Department of Electrical Engineering and Computer Science)
Institute for Better Health
Funding
Supported by Digital Supercluster Canada (stated funder of the research project).
Ethical review
Approved by the Research Ethics Board at Trillium Health Partners, with waiver of informed consent.
Comments
Single-center external testing dataset of adult posteroanterior chest radiographs with report-derived labels used to evaluate chest radiograph classifiers.
Date
Published: 2023-07-12
References
[1] Ahluwalia M, Abdalla M, Sanayei J, Seyyed-Kalantari L, Hussain M, Ali A, Fine B. "The Subgroup Imperative: Chest Radiograph Classifier Generalization Gaps in Patient, Setting, and Pathology Subgroups". Radiology: Artificial Intelligence. 2023-09-01. doi:10.1148/ryai.220270. PMID: 37795140. PMCID: PMC10546359.
Dataset
Motivation
To perform large-scale external testing with robust subgroup analysis of chest radiograph classifiers in a real-world clinical dataset.
Sampling
Consecutive adult posteroanterior chest radiographs from January 2016 to December 2020 at a three-site hospital system; duplicates removed; images unreadable by one or more algorithms excluded.
Partitioning scheme
Single external testing dataset; no training/validation partitions created.
Missing information
Patient count, per-site counts, image resolution, and exact file formats beyond DICOM are not reported.
Relationships between instances
Consecutive adult PA studies included; duplicate studies were removed. Multiple studies from the same patient may remain.
Noise
Ground truth derived from NLP on radiology reports; validated on 502 reports with 94% accuracy for CheXpert compared with manual labels; known errors particularly in cardiomediastinum/cardiomegaly phrasing and nuanced negations.
External data
Ground truth labels were generated from associated radiology reports using open-source NLP tools (CheXpert; CheXbert evaluated but not used for full dataset).
Confidentiality
Images and reports were de-identified; remaining DICOM metadata elements were removed except those needed for subgroup analysis. Radiology reports were de-identified using regular expressions for names and dates.
Sensitive data
EHR-reported sex and name-based ancestry (as a proxy for ethnicity) were derived for subgroup analyses.