RadBERT
model2026-01-24https://doi.org/10.1148/atlas.1769274513631
41

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

RadBERT

Link

https://dx.doi.org/10.1148/ryai.210258

Indexing

Keywords: Radiology NLP, Transformer, BERT, RoBERTa, Domain adaptation, Report coding, Abnormal sentence classification, Report summarization, ROUGE
Content: IN, OT
RadLex: RID45975, RID50134, RID34374, RID905

Author(s)

An Yan
Julian McAuley
Xing Lu
Jiang Du
Eric Y. Chang
Amilcare Gentili
Chun-Nan Hsu

Organization(s)

University of California, San Diego
Veterans Affairs San Diego Healthcare System
U.S. Department of Veterans Affairs

Version

1.0

Contact

ude.dscu@nannuhc

Funding

Supported by award no. W81XWH–20–1–0693 (log no. DM190543), Accelerating Innovation in Military Medicine Program, Office of the Assistant Secretary of Defense for Health Affairs, Department of Defense; and NSF Career Award no. 1750063.

Ethical review

Institutional review board approval for exemption from informed consent to use radiology reports; de-identified and de-duplicated VA radiology reports were used.

Date

Updated: 2022-07-01
Published: 2022-06-15
Created: 2021-10-12

References

[1] Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, Hsu C-N. "RadBERT: Adapting Transformer-based Language Models to Radiology". Radiology: Artificial Intelligence. 2022;4(4):e210258. 2022-06-15. doi:10.1148/ryai.210258. PMID: 35923376. PMCID: PMC9344353.

Model

Architecture

Transformer-based language models (BERT-base and RoBERTa) adapted to radiology via masked language model pretraining on radiology reports.

Availability

RadBERT models can be released upon request with a data usage agreement.

Clinical benefit

Improves performance on radiology NLP tasks (abnormal sentence classification, report coding, report summarization), enabling better follow-up identification, standardized documentation, and reduced radiologist workload.

Clinical workflow phase

Workflow optimization and clinical decision support through automated analysis of radiology reports; facilitates follow-up tracking and documentation.

Indications for use

Analysis of de-identified radiology reports to support tasks including sentence-level abnormality detection, assignment of diagnostic/reporting codes (e.g., BI-RADS, Lung-RADS, AAA, etc.), and extractive summarization of findings into impressions; intended for use in radiology departments and healthcare systems with similar report corpora.

Input

De-identified radiology report text (sentences or full reports) from the VA nationwide corpus (2.16–4.42 million reports used for pretraining).

Instructions

Pretrain RadBERT with masked language modeling on large radiology report corpora (WordPiece tokenization; no next sentence prediction), then fine-tune for downstream tasks. For extractive summarization, sentence embeddings from the pretrained model can be used directly (per cited method) without fine-tuning.

Limitations

Only BERT-base sized models (≈110M parameters) were trained; BERT-large was not evaluated. Pretraining sizes of ~2M vs ~4M reports showed similar performance; the optimal corpus size within radiology is undetermined. Pretraining focused on VA reports; further specialization by modality/body part was not explored.

Output

CDEs: RDE1707, RDE2030, RDE1702.25
Description: - Binary sentence classification (abnormal vs normal) - Multiclass report-level diagnostic/reporting codes across five coding systems - Extractive summaries of report findings approximating the impression section

Recommendation

Use a radiology-specialized pretrained transformer (RadBERT) for radiology NLP tasks, especially when annotated data for fine-tuning are scarce.

Reproducibility

For classification tasks, mean and SD over five runs with different random seeds were reported; significance assessed via bootstrap resampling (10,000 trials).

Use

Intended: Report summarization, Report processing, Report data extraction