RadBERT
2026-01-24https://doi.org/10.1148/atlas.1769274513631
41
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/model.json
Name
RadBERT
Link
https://dx.doi.org/10.1148/ryai.210258
Indexing
Keywords: Radiology NLP, Transformer, BERT, RoBERTa, Domain adaptation, Report coding, Abnormal sentence classification, Report summarization, ROUGE
Content: IN, OT
RadLex: RID45975, RID50134, RID34374, RID905
Author(s)
An Yan
Julian McAuley
Xing Lu
Jiang Du
Eric Y. Chang
Amilcare Gentili
Chun-Nan Hsu
Organization(s)
University of California, San Diego
Veterans Affairs San Diego Healthcare System
U.S. Department of Veterans Affairs
Version
1.0
Contact
ude.dscu@nannuhc
Funding
Supported by award no. W81XWH–20–1–0693 (log no. DM190543), Accelerating Innovation in Military Medicine Program, Office of the Assistant Secretary of Defense for Health Affairs, Department of Defense; and NSF Career Award no. 1750063.
Ethical review
Institutional review board approval for exemption from informed consent to use radiology reports; de-identified and de-duplicated VA radiology reports were used.
Date
Updated: 2022-07-01
Published: 2022-06-15
Created: 2021-10-12
References
[1] Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, Hsu C-N. "RadBERT: Adapting Transformer-based Language Models to Radiology". Radiology: Artificial Intelligence. 2022;4(4):e210258. 2022-06-15. doi:10.1148/ryai.210258. PMID: 35923376. PMCID: PMC9344353.
Model
Architecture
Transformer-based language models (BERT-base and RoBERTa) adapted to radiology via masked language model pretraining on radiology reports.
Availability
RadBERT models can be released upon request with a data usage agreement.
Clinical benefit
Improves performance on radiology NLP tasks (abnormal sentence classification, report coding, report summarization), enabling better follow-up identification, standardized documentation, and reduced radiologist workload.
Clinical workflow phase
Workflow optimization and clinical decision support through automated analysis of radiology reports; facilitates follow-up tracking and documentation.
Indications for use
Analysis of de-identified radiology reports to support tasks including sentence-level abnormality detection, assignment of diagnostic/reporting codes (e.g., BI-RADS, Lung-RADS, AAA, etc.), and extractive summarization of findings into impressions; intended for use in radiology departments and healthcare systems with similar report corpora.
Input
De-identified radiology report text (sentences or full reports) from the VA nationwide corpus (2.16–4.42 million reports used for pretraining).
Instructions
Pretrain RadBERT with masked language modeling on large radiology report corpora (WordPiece tokenization; no next sentence prediction), then fine-tune for downstream tasks. For extractive summarization, sentence embeddings from the pretrained model can be used directly (per cited method) without fine-tuning.
Limitations
Only BERT-base sized models (≈110M parameters) were trained; BERT-large was not evaluated. Pretraining sizes of ~2M vs ~4M reports showed similar performance; the optimal corpus size within radiology is undetermined. Pretraining focused on VA reports; further specialization by modality/body part was not explored.
Output
CDEs: RDE1707, RDE2030, RDE1702.25
Description: - Binary sentence classification (abnormal vs normal)
- Multiclass report-level diagnostic/reporting codes across five coding systems
- Extractive summaries of report findings approximating the impression section
Recommendation
Use a radiology-specialized pretrained transformer (RadBERT) for radiology NLP tasks, especially when annotated data for fine-tuning are scarce.
Reproducibility
For classification tasks, mean and SD over five runs with different random seeds were reported; significance assessed via bootstrap resampling (10,000 trials).
Use
Intended: Report summarization, Report processing, Report data extraction