Radiology BERT
model2026-01-24https://doi.org/10.1148/atlas.1769275373753
301

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

Radiology BERT

Link

https://doi.org/10.1148/ryai.210185

Indexing

Keywords: radiology reports, speech recognition errors, BERT, natural language processing, error detection, token classification
Content: IN
RadLex: RID1034, RID5678, RID1245

Author(s)

Gunvant R. Chaudhari
Tengxiao Liu
Timothy L. Chen
Gabby B. Joseph
Maya Vella
Yoo Jin Lee
Thienkhai H. Vu
Youngho Seo
Andreas M. Rauschecker
Charles E. McCulloch
Jae Ho Sohn

Organization(s)

Department of Radiology and Biomedical Imaging, University of California San Francisco
Department of Epidemiology and Statistics, University of California San Francisco

Version

1.0

License

Text: © 2022 by the Radiological Society of North America, Inc.

Contact

Jae Ho Sohn

Funding

Authors declared no funding for this work.

Ethical review

Retrospective model development approved by the institutional human ethics board and conducted in accordance with the Helsinki Declaration (consent waived).

Date

Updated: 2022-05-10
Published: 2022-05-25
Created: 2021-07-05

References

[1] Chaudhari GR, Liu T, Chen TL, Joseph GB, Vella M, Lee YJ, Vu TH, Seo Y, Rauschecker AM, McCulloch CE, Sohn JH. "Application of a Domain-specific BERT for Detection of Speech Recognition Errors in Radiology Reports". Radiology: Artificial Intelligence. 2022 Jul;4(4):e210185.. 2022-07-01. doi:10.1148/ryai.210185. PMID: 35923373. PMCID: PMC9344210.

Model

Architecture

Bidirectional Encoder Representations from Transformers (BERT) initialized from Clinical BioBERT, further pretrained on radiology report corpus (masked language modeling and next sentence prediction), then fine-tuned for token-level classification with a fully connected linear layer and softmax to label tokens as normal, insertion, deletion, substitution, or padding.

Availability

Not provided.

Clinical benefit

Flags potential speech recognition errors and suggests corrections in radiology report impression sentences to reduce proofreading burden and improve report quality.

Clinical workflow phase

Clinical decision support systems; workflow optimization at report proofreading/signing.

Decision threshold

For sentence-level analyses, optimal ROC threshold (point closest to [0,1]) on signed reports test set; same threshold applied to prospective dataset.

Degree of automation

Decision support—assists radiologists by automatically flagging suspected errors; final decisions remain with the user.

Indications for use

Detection of insertion, deletion, and substitution speech recognition errors in impression sentences of radiology reports across multiple imaging modalities in hospital radiology departments using dictation-based workflows.

Input

Impression section sentences from dictated radiology reports (PowerScribe).

Instructions

Use on impression sentences prior to report signing to flag unusual or out-of-context tokens for radiologist review; corrections can be suggested by the companion correction model.

Limitations

Developed using reports from two institutions and a single SR software (PowerScribe); syntax variability across sites and radiologists can cause false positives; trained on imperfect reports—some true errors may be present in training data leading to false negatives; sentence-level model cannot leverage full report, imaging, or EMR context, limiting detection of certain errors (e.g., negation/laterality changes).

Output

CDEs: RDE2267, RDE397, RDE341
Description: Token-level classification indicating normal token or suspected insertion, deletion, or substitution error; sentence-level likelihood of containing an error. A separate model provides top candidate word suggestions for detected deletion/substitution errors.

Recommendation

Use as an assistive tool to highlight potential SR errors for radiologist verification prior to signing reports.

Reproducibility

Implemented with PyTorch (v1.6.0) and HuggingFace Transformers (v3.4.0); fivefold cross-validation used during model search; thresholds and bootstrapped CIs described in the paper.

Use

Intended: Detection, Mitigation
Out-of-scope: Artifact reduction, Report processing
Excluded: Other

User

Intended: Radiologist, Other
Out-of-scope: Patient
Excluded: Layperson