Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm

Link

https://doi.org/10.1148/ryai.230085

Indexing

Keywords: de-identification, anonymization, radiographic markers, laterality markers, object detection, YOLOv5-x, OCR, Tesseract, Otsu thresholding, connected components, pelvis, chest, CheXpert, transfer learning, two-pass algorithm

Content: CH, MK

RadLex: RID10345

Author(s)

Bardia Khosravi

John P. Mickley

Pouria Rouzrokh

Michael J. Taunton

A. Noelle Larson

Bradley J. Erickson

Cody C. Wyles

Organization(s)

Mayo Clinic, Orthopedic Surgery Artificial Intelligence Laboratory, Department of Orthopedic Surgery

Mayo Clinic, Radiology Informatics Laboratory, Department of Radiology

Mayo Clinic, Department of Clinical Anatomy

Version

1.0

License

URL: https://pubs.rsna.org/doi/10.1148/ryai.230085

Contact

Cody C. Wyles (email: ude.oyam@ydoC.selyW)

Funding

Supported by the Mayo Foundation Presidential Fund.

Ethical review

Institutional review board approved with waiver of informed consent.

Date

Updated: 2023-11-01

Published: 2023-09-13

Created: 2023-03-19

References

[1] Khosravi B, Mickley JP, Rouzrokh P, Taunton MJ, Larson AN, Erickson BJ, Wyles CC. "Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm". Radiology: Artificial Intelligence. 2023 Nov;5(6):e230085. 2023-09-13. doi:10.1148/ryai.230085. PMID: 38074777. PMCID: PMC10698585.

Model

Architecture

Two-pass pipeline: (1) YOLOv5-x (Ultralytics) object detection model to localize radiographic markers; (2) postprocessing to retain desired markers (e.g., laterality) using Otsu thresholding, connected component analysis, orientation correction, and OCR (Tesseract with LSTM) to detect isolated R/L; non-retained components are replaced by black pixels.

Availability

Open-source code: https://github.com/OrthopedicSurgeryAILab/RadiographMarkerRemover

Clinical benefit

Enables de-identification of radiographs by removing burnt-in markers containing PHI while selectively retaining useful non-identifying markers (e.g., laterality), facilitating data sharing and robust AI model development.

Clinical workflow phase

Data de-identification and preprocessing for research data sharing and AI development.

Degree of automation

Fully automated detection and removal of radiographic markers with optional automated retention of specified markers (e.g., R/L).

Indications for use

Automated anonymization of radiographs by localizing and removing radiographic markers containing protected health information; intended for radiographic images (e.g., pelvis, chest) in research data pipelines and multi-institutional data sharing environments.

Input

Radiographic images (e.g., anteroposterior pelvis and chest radiographs).

Instructions

Run the YOLOv5-x localizer to detect all marker regions; apply the postprocessing/OCR module to retain desired markers (e.g., isolated R/L) and redact others; optionally fine-tune the localizer on a small sample (e.g., 20 images) from a target domain to improve generalization; for training downstream AI models, disable marker retention to avoid shortcut learning from markers.

Limitations

External public images lacked PHI, so external evaluation focused on study- and staff-related markers; algorithm may require fine-tuning to cover other marker types and other body parts/modalities; false positives occurred with non-marker items (e.g., ECG leads) not seen in training; OCR-based retention can be sensitive to image quality and marker variability.

Output

CDEs: RDE379, RDE2943

Description: - Bounding boxes for localized radiographic markers; - De-identified image with non-retained marker regions redacted (blackened); - Optional retention of laterality markers (R/L).

Recommendation

For deployment on new domains, fine-tune with a small set of local images; retain only strictly non-identifying markers as needed; disable retention during training of diagnostic models to prevent shortcut learning.

Reproducibility

Annotations created with LabelMe v5.0.1; model trained in PyTorch v1.11.0; OCR with Tesseract v5.0.0; code and pipeline available on GitHub with details in Appendix S1.

Sustainability

Average processing time per image: 1.43 s on CPU (Intel i7 2.5 GHz) and 0.51 s on GPU (NVIDIA V100).

Use

Intended: Image processing

Out-of-scope: Decision support, Detection and diagnosis

Excluded: Other

User

Intended: Other, Researcher

Out-of-scope: Patient