UK Multi-hospital Longitudinal Chest Radiograph Dataset for Patient Reidentification (2006–2019)
dataset2025-12-03https://doi.org/10.1148/atlas.1764779953165
42

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

UK Multi-hospital Longitudinal Chest Radiograph Dataset for Patient Reidentification (2006–2019)

Link

https://pmc.ncbi.nlm.nih.gov/articles/PMC10698609

Indexing

Keywords: chest radiograph, patient reidentification, deep metric learning, longitudinal imaging, identity confirmation, database retrieval, GAN explainability
Content: CH, RS

Author(s)

Matthew S. Macpherson
Charles E. Hutchinson
Carolyn Horst
Vicky Goh
Giovanni Montana

Organization(s)

University of Warwick
University Hospitals Coventry and Warwickshire NHS Trust
King’s College London
Guy’s and St Thomas’ NHS Foundation Trust
Alan Turing Institute

Contact

Giovanni Montana (corresponding author)

Funding

Supported by the Wellcome Trust (research grant) and the Engineering and Physical Sciences Research Council (EPSRC) student funding.

Ethical review

Data gathered by six UK hospitals (2006–2019) following national governance (Governance Arrangements for Research Ethics Committees) and NHS data opt-out procedures; data previously collected as part of care and support and used as nonidentifiable information.

Comments

Retrospective multi-institutional dataset of adult frontal chest radiographs with free-text reports, assembled for training and evaluation of a patient reidentification deep metric learning model and longitudinal abnormality analysis.

Date

Published: 2023-09-20
Created: 2006-01-01

References

[1] Macpherson MS, Hutchinson CE, Horst C, Goh V, Montana G. "Patient Reidentification from Chest Radiographs: An Interpretable Deep Metric Learning Approach and Its Applications". Radiology: Artificial Intelligence. 2023-11-01. doi:10.1148/ryai.230019. PMID: 38074779. PMCID: PMC10698609.
[2] Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. "ChestX-Ray8/ChestX-ray14 dataset". CVPR 2017. 2017-01-01. Available from: https://nihcc.app.box.com/v/ChestXray-NIHCC
[3] Irvin J, Rajpurkar P, Ko M, et al.. "CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison". AAAI 2019. 2019-01-01.
[4] Johnson AEW, Pollard TJ, Berkowitz SJ, et al.. "MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports". Scientific Data. 2019-01-01. PMID: 31831740. PMCID: PMC6908718. Available from: https://physionet.org/

Dataset

Motivation

To train and interpret a deep learning model for patient reidentification from chest radiographs and assess longitudinal identity change as a biomarker for emerging abnormalities.

Sampling

All available chest radiograph DICOM files from PACS at six hospitals (2006–2019), excluding patients <16 years, nonfrontal images, corrupt/derived images, images without reports, and patients with only one scan.

Partitioning scheme

Split by patient identity into training (85%), validation (5%), and test (10%).

Missing information

Detailed patient metadata such as height/weight and clinical outcomes not available.

Relationships between instances

Longitudinal series per patient with varying time intervals; abnormality status varies over time.

Noise

Some images in external datasets were poorly aligned or cropped, contributing to false-positive matches.

External data

External validation performed on ChestX-ray14, CheXpert, and MIMIC-CXR public datasets.

Confidentiality

Pseudoanonymized patient identifiers; nonidentifiable information used; reports processed via NLP for labels.

Re-identification

Study demonstrates high reidentification performance using image content alone, underscoring potential privacy risks in de-identified chest radiograph datasets.

Sensitive data

Adult patient medical images and associated free-text radiology reports.