Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm
model2025-12-05https://doi.org/10.1148/atlas.1764971581117
191

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm

Link

https://doi.org/10.1148/ryai.230085

Indexing

Keywords: de-identification, anonymization, radiographic markers, laterality markers, object detection, YOLOv5-x, OCR, Tesseract, Otsu thresholding, connected components, pelvis, chest, CheXpert, transfer learning, two-pass algorithm
Content: CH, MK
RadLex: RID10345

Author(s)

Bardia Khosravi
John P. Mickley
Pouria Rouzrokh
Michael J. Taunton
A. Noelle Larson
Bradley J. Erickson
Cody C. Wyles

Organization(s)

Mayo Clinic, Orthopedic Surgery Artificial Intelligence Laboratory, Department of Orthopedic Surgery
Mayo Clinic, Radiology Informatics Laboratory, Department of Radiology
Mayo Clinic, Department of Clinical Anatomy

Version

1.0

License

Text: © 2023 by the Radiological Society of North America, Inc.
URL: https://pubs.rsna.org/doi/10.1148/ryai.230085

Contact

Cody C. Wyles (email: ude.oyam@ydoC.selyW)

Funding

Supported by the Mayo Foundation Presidential Fund.

Ethical review

Institutional review board approved with waiver of informed consent.

Date

Updated: 2023-11-01
Published: 2023-09-13
Created: 2023-03-19

References

[1] Khosravi B, Mickley JP, Rouzrokh P, Taunton MJ, Larson AN, Erickson BJ, Wyles CC. "Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm". Radiology: Artificial Intelligence. 2023 Nov;5(6):e230085. 2023-09-13. doi:10.1148/ryai.230085. PMID: 38074777. PMCID: PMC10698585.

Model

Architecture

Two-pass pipeline: (1) YOLOv5-x (Ultralytics) object detection model to localize radiographic markers; (2) postprocessing to retain desired markers (e.g., laterality) using Otsu thresholding, connected component analysis, orientation correction, and OCR (Tesseract with LSTM) to detect isolated R/L; non-retained components are replaced by black pixels.

Availability

Open-source code: https://github.com/OrthopedicSurgeryAILab/RadiographMarkerRemover

Clinical benefit

Enables de-identification of radiographs by removing burnt-in markers containing PHI while selectively retaining useful non-identifying markers (e.g., laterality), facilitating data sharing and robust AI model development.

Clinical workflow phase

Data de-identification and preprocessing for research data sharing and AI development.

Degree of automation

Fully automated detection and removal of radiographic markers with optional automated retention of specified markers (e.g., R/L).

Indications for use

Automated anonymization of radiographs by localizing and removing radiographic markers containing protected health information; intended for radiographic images (e.g., pelvis, chest) in research data pipelines and multi-institutional data sharing environments.

Input

Radiographic images (e.g., anteroposterior pelvis and chest radiographs).

Instructions

Run the YOLOv5-x localizer to detect all marker regions; apply the postprocessing/OCR module to retain desired markers (e.g., isolated R/L) and redact others; optionally fine-tune the localizer on a small sample (e.g., 20 images) from a target domain to improve generalization; for training downstream AI models, disable marker retention to avoid shortcut learning from markers.

Limitations

External public images lacked PHI, so external evaluation focused on study- and staff-related markers; algorithm may require fine-tuning to cover other marker types and other body parts/modalities; false positives occurred with non-marker items (e.g., ECG leads) not seen in training; OCR-based retention can be sensitive to image quality and marker variability.

Output

CDEs: RDE379, RDE2943
Description: - Bounding boxes for localized radiographic markers; - De-identified image with non-retained marker regions redacted (blackened); - Optional retention of laterality markers (R/L).

Recommendation

For deployment on new domains, fine-tune with a small set of local images; retain only strictly non-identifying markers as needed; disable retention during training of diagnostic models to prevent shortcut learning.

Reproducibility

Annotations created with LabelMe v5.0.1; model trained in PyTorch v1.11.0; OCR with Tesseract v5.0.0; code and pipeline available on GitHub with details in Appendix S1.

Sustainability

Average processing time per image: 1.43 s on CPU (Intel i7 2.5 GHz) and 0.51 s on GPU (NVIDIA V100).

Use

Intended: Image processing
Out-of-scope: Decision support, Detection and diagnosis
Excluded: Other

User

Intended: Other, Researcher
Out-of-scope: Patient