UK Multi-hospital Longitudinal Chest Radiograph Dataset for Patient Reidentification (2006–2019)
2025-12-03https://doi.org/10.1148/atlas.1764779953165
42
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
UK Multi-hospital Longitudinal Chest Radiograph Dataset for Patient Reidentification (2006–2019)
Link
https://pmc.ncbi.nlm.nih.gov/articles/PMC10698609
Indexing
Keywords: chest radiograph, patient reidentification, deep metric learning, longitudinal imaging, identity confirmation, database retrieval, GAN explainability
Content: CH, RS
Author(s)
Matthew S. Macpherson
Charles E. Hutchinson
Carolyn Horst
Vicky Goh
Giovanni Montana
Organization(s)
University of Warwick
University Hospitals Coventry and Warwickshire NHS Trust
King’s College London
Guy’s and St Thomas’ NHS Foundation Trust
Alan Turing Institute
Contact
Giovanni Montana (corresponding author)
Funding
Supported by the Wellcome Trust (research grant) and the Engineering and Physical Sciences Research Council (EPSRC) student funding.
Ethical review
Data gathered by six UK hospitals (2006–2019) following national governance (Governance Arrangements for Research Ethics Committees) and NHS data opt-out procedures; data previously collected as part of care and support and used as nonidentifiable information.
Comments
Retrospective multi-institutional dataset of adult frontal chest radiographs with free-text reports, assembled for training and evaluation of a patient reidentification deep metric learning model and longitudinal abnormality analysis.
Date
Published: 2023-09-20
Created: 2006-01-01
References
[1] Macpherson MS, Hutchinson CE, Horst C, Goh V, Montana G. "Patient Reidentification from Chest Radiographs: An Interpretable Deep Metric Learning Approach and Its Applications". Radiology: Artificial Intelligence. 2023-11-01. doi:10.1148/ryai.230019. PMID: 38074779. PMCID: PMC10698609.
[2] Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. "ChestX-Ray8/ChestX-ray14 dataset". CVPR 2017. 2017-01-01. Available from: https://nihcc.app.box.com/v/ChestXray-NIHCC
[3] Irvin J, Rajpurkar P, Ko M, et al.. "CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison". AAAI 2019. 2019-01-01.
[4] Johnson AEW, Pollard TJ, Berkowitz SJ, et al.. "MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports". Scientific Data. 2019-01-01. PMID: 31831740. PMCID: PMC6908718. Available from: https://physionet.org/
Dataset
Motivation
To train and interpret a deep learning model for patient reidentification from chest radiographs and assess longitudinal identity change as a biomarker for emerging abnormalities.
Sampling
All available chest radiograph DICOM files from PACS at six hospitals (2006–2019), excluding patients <16 years, nonfrontal images, corrupt/derived images, images without reports, and patients with only one scan.
Partitioning scheme
Split by patient identity into training (85%), validation (5%), and test (10%).
Missing information
Detailed patient metadata such as height/weight and clinical outcomes not available.
Relationships between instances
Longitudinal series per patient with varying time intervals; abnormality status varies over time.
Noise
Some images in external datasets were poorly aligned or cropped, contributing to false-positive matches.
External data
External validation performed on ChestX-ray14, CheXpert, and MIMIC-CXR public datasets.
Confidentiality
Pseudoanonymized patient identifiers; nonidentifiable information used; reports processed via NLP for labels.
Re-identification
Study demonstrates high reidentification performance using image content alone, underscoring potential privacy risks in de-identified chest radiograph datasets.
Sensitive data
Adult patient medical images and associated free-text radiology reports.