Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm
2025-12-05https://doi.org/10.1148/atlas.1764971581117
191
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/model.json
Name
Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm
Link
https://doi.org/10.1148/ryai.230085
Indexing
Keywords: de-identification, anonymization, radiographic markers, laterality markers, object detection, YOLOv5-x, OCR, Tesseract, Otsu thresholding, connected components, pelvis, chest, CheXpert, transfer learning, two-pass algorithm
Content: CH, MK
RadLex: RID10345
Author(s)
Bardia Khosravi
John P. Mickley
Pouria Rouzrokh
Michael J. Taunton
A. Noelle Larson
Bradley J. Erickson
Cody C. Wyles
Organization(s)
Mayo Clinic, Orthopedic Surgery Artificial Intelligence Laboratory, Department of Orthopedic Surgery
Mayo Clinic, Radiology Informatics Laboratory, Department of Radiology
Mayo Clinic, Department of Clinical Anatomy
Version
1.0
License
Text: © 2023 by the Radiological Society of North America, Inc.
URL: https://pubs.rsna.org/doi/10.1148/ryai.230085
Contact
Cody C. Wyles (email: ude.oyam@ydoC.selyW)
Funding
Supported by the Mayo Foundation Presidential Fund.
Ethical review
Institutional review board approved with waiver of informed consent.
Date
Updated: 2023-11-01
Published: 2023-09-13
Created: 2023-03-19
References
[1] Khosravi B, Mickley JP, Rouzrokh P, Taunton MJ, Larson AN, Erickson BJ, Wyles CC. "Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm". Radiology: Artificial Intelligence. 2023 Nov;5(6):e230085. 2023-09-13. doi:10.1148/ryai.230085. PMID: 38074777. PMCID: PMC10698585.
Model
Architecture
Two-pass pipeline: (1) YOLOv5-x (Ultralytics) object detection model to localize radiographic markers; (2) postprocessing to retain desired markers (e.g., laterality) using Otsu thresholding, connected component analysis, orientation correction, and OCR (Tesseract with LSTM) to detect isolated R/L; non-retained components are replaced by black pixels.
Availability
Open-source code: https://github.com/OrthopedicSurgeryAILab/RadiographMarkerRemover
Clinical benefit
Enables de-identification of radiographs by removing burnt-in markers containing PHI while selectively retaining useful non-identifying markers (e.g., laterality), facilitating data sharing and robust AI model development.
Clinical workflow phase
Data de-identification and preprocessing for research data sharing and AI development.
Degree of automation
Fully automated detection and removal of radiographic markers with optional automated retention of specified markers (e.g., R/L).
Indications for use
Automated anonymization of radiographs by localizing and removing radiographic markers containing protected health information; intended for radiographic images (e.g., pelvis, chest) in research data pipelines and multi-institutional data sharing environments.
Input
Radiographic images (e.g., anteroposterior pelvis and chest radiographs).
Instructions
Run the YOLOv5-x localizer to detect all marker regions; apply the postprocessing/OCR module to retain desired markers (e.g., isolated R/L) and redact others; optionally fine-tune the localizer on a small sample (e.g., 20 images) from a target domain to improve generalization; for training downstream AI models, disable marker retention to avoid shortcut learning from markers.
Limitations
External public images lacked PHI, so external evaluation focused on study- and staff-related markers; algorithm may require fine-tuning to cover other marker types and other body parts/modalities; false positives occurred with non-marker items (e.g., ECG leads) not seen in training; OCR-based retention can be sensitive to image quality and marker variability.
Output
CDEs: RDE379, RDE2943
Description: - Bounding boxes for localized radiographic markers; - De-identified image with non-retained marker regions redacted (blackened); - Optional retention of laterality markers (R/L).
Recommendation
For deployment on new domains, fine-tune with a small set of local images; retain only strictly non-identifying markers as needed; disable retention during training of diagnostic models to prevent shortcut learning.
Reproducibility
Annotations created with LabelMe v5.0.1; model trained in PyTorch v1.11.0; OCR with Tesseract v5.0.0; code and pipeline available on GitHub with details in Appendix S1.
Sustainability
Average processing time per image: 1.43 s on CPU (Intel i7 2.5 GHz) and 0.51 s on GPU (NVIDIA V100).
Use
Intended: Image processing
Out-of-scope: Decision support, Detection and diagnosis
Excluded: Other
User
Intended: Other, Researcher
Out-of-scope: Patient