Trillium Health Partners Adult Posteroanterior Chest Radiograph External Test Set (2016–2020)
dataset2025-12-06https://doi.org/10.1148/atlas.1765032217785
143

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

Trillium Health Partners Adult Posteroanterior Chest Radiograph External Test Set (2016–2020)

Link

https://pubmed.ncbi.nlm.nih.gov/37795140/

Indexing

Keywords: chest x-ray, posteroanterior, external validation, natural language processing labels, subgroup analysis, generalization gap, bias, emergency department, ICU, support devices
Content: CH, IN, RS
RadLex: RID10345, RID5557, RID28625, RID45946
SNOMED: 233604007, 36118008, 8186001, 33737001, 60046008

Author(s)

Monish Ahluwalia
Mohamed Abdalla
James Sanayei
Laleh Seyyed-Kalantari
Mohannad Hussain
Amna Ali
Benjamin Fine

Organization(s)

Trillium Health Partners
University of Toronto
Vector Institute for Artificial Intelligence
Queen’s University (Kingston Health Sciences Centre)
York University (Department of Electrical Engineering and Computer Science)
Institute for Better Health

Funding

Supported by Digital Supercluster Canada (stated funder of the research project).

Ethical review

Approved by the Research Ethics Board at Trillium Health Partners, with waiver of informed consent.

Comments

Single-center external testing dataset of adult posteroanterior chest radiographs with report-derived labels used to evaluate chest radiograph classifiers.

Date

Published: 2023-07-12

References

[1] Ahluwalia M, Abdalla M, Sanayei J, Seyyed-Kalantari L, Hussain M, Ali A, Fine B. "The Subgroup Imperative: Chest Radiograph Classifier Generalization Gaps in Patient, Setting, and Pathology Subgroups". Radiology: Artificial Intelligence. 2023-09-01. doi:10.1148/ryai.220270. PMID: 37795140. PMCID: PMC10546359.

Dataset

Motivation

To perform large-scale external testing with robust subgroup analysis of chest radiograph classifiers in a real-world clinical dataset.

Sampling

Consecutive adult posteroanterior chest radiographs from January 2016 to December 2020 at a three-site hospital system; duplicates removed; images unreadable by one or more algorithms excluded.

Partitioning scheme

Single external testing dataset; no training/validation partitions created.

Missing information

Patient count, per-site counts, image resolution, and exact file formats beyond DICOM are not reported.

Relationships between instances

Consecutive adult PA studies included; duplicate studies were removed. Multiple studies from the same patient may remain.

Noise

Ground truth derived from NLP on radiology reports; validated on 502 reports with 94% accuracy for CheXpert compared with manual labels; known errors particularly in cardiomediastinum/cardiomegaly phrasing and nuanced negations.

External data

Ground truth labels were generated from associated radiology reports using open-source NLP tools (CheXpert; CheXbert evaluated but not used for full dataset).

Confidentiality

Images and reports were de-identified; remaining DICOM metadata elements were removed except those needed for subgroup analysis. Radiology reports were de-identified using regular expressions for names and dates.

Sensitive data

EHR-reported sex and name-based ancestry (as a proxy for ethnicity) were derived for subgroup analyses.