Curation of the CANDID-PTX Dataset with Free-Text Reports
Curation of the CANDID-PTX Dataset with Free-Text Reports
dataset2025-11-29https://doi.org/10.1148/atlas.1764458274807
62

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

Curation of the CANDID-PTX Dataset with Free-Text Reports

Link

https://doi.org/10.1148/ryai.2021210136

Indexing

Keywords: Conventional Radiography, Thorax, Trauma, Ribs, Catheters, Segmentation, Diagnosis, Classification, Supervised Learning, Machine Learning
Content: CH
RadLex: RID5352, RID5573

Author(s)

Sijing Feng
Damian Azzollini
Ji Soo Kim
Cheng-Kai Jin
Simon P. Gordon
Jason Yeoh
Eve Kim
Mina Han
Andrew Lee
Aakash Patel
Joy Wu
Martin Urschler
Amy Fong
Cameron Simmers
Gregory P. Tarr
Stuart Barnard
Ben Wilson

Organization(s)

Department of Radiology, Dunedin Hospital
Eastern Health
Auckland District Health Board
Waitemata District Health Board
Waikato District Health Board
The University of Auckland Faculty of Medical and Health Sciences
University of Otago Medical School
IBM Almaden Research Center
School of Computer Science, University of Auckland
Department of Radiology, Auckland City Hospital
Department of Radiology, Middlemore Hospital

Contact

sijingfeng@gmail.com

Funding

Royal Australia and New Zealand College of Radiologists (RANZCR) research grant in 2020.

Ethical review

Approved by the University of Otago Human Ethics Committee, which waived individual patient informed consent due to low-risk nature and use of anonymized data.

Comments

This large chest radiograph dataset has segmented annotations for pneumothoraces, acute rib fractures, and intercostal chest tubes. The dataset, which can be used to train and test machine learning algorithms in the identification of these features on chest radiographs, includes corresponding anonymized, free-text radiology reports. The temporal relationship between images of the same patient has been preserved.

Dataset

Motivation

To curate the first large pneumothorax segmentation dataset in a New Zealand population to address the bottleneck in developing large, well-annotated datasets for training and testing AI algorithms on adult chest radiographs.

Sampling

A total of 295,613 chest radiographs and their radiology reports were retrospectively acquired from Dunedin Hospital PACS between January 2010 and April 2020. Inclusion criteria: frontal chest radiographs (including bedside images) from patients over 16 years of age.

Missing information

30 images had no information on patient sex when extraction from the original DICOM metadata was attempted.

Relationships between instances

The temporal relationship between images of the same patient has been preserved.

Noise

80 images were excluded due to lacking clear 'pneumothorax' or 'no pneumothorax' labels, likely due to human error. The dataset's single-institution origin reduces heterogeneity, potentially limiting AI algorithm generalizability.

External data

MD.ai, a Health Insurance Portability and Accountability Act–compliant, commercial, cloud-based image annotation platform, was used for image upload and annotation.

Confidentiality

Image metadata and reports were de-identified using algorithm-based tools to remove protected health information (PHI) and satisfy New Zealand Ministry of Health standards. Manual review ensured quality of anonymization.

Re-identification

File names containing acquisition dates were anonymized consistently and irreversibly by translating them into future dates, while preserving temporal relationships between images.

Sensitive data

Protected health information (PHI) was removed from images and reports during de-identification.