GPT-4 for Detection of Speech Recognition Errors in Radiology Reports
2025-11-30https://doi.org/10.1148/atlas.1764460867940
122
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/model.json
Name
GPT-4 for Detection of Speech Recognition Errors in Radiology Reports
Link
https://pubmed.ncbi.nlm.nih.gov/38265301/
Indexing
Keywords: CT, MRI, Radiology Reports, Speech Recognition, Large Language Model, Natural Language Processing, Error detection
Content: IN
RadLex: RID10312, RID45910, RID10321
Author(s)
Reuben A. Schmidt
Jarrel C. Y. Seah
Ke Cao
Lincoln Lim
Wei Lim
Justin Yeung
Organization(s)
Department of Medical Imaging, Western Health, Footscray, Australia
Alfred Health, Harrison.ai, Monash University, Clayton, Australia
Department of Surgery, Western Precinct, University of Melbourne, Melbourne, Australia
Department of Surgery, Western Health, Melbourne, Australia
Version
1.0
License
Text: © 2024 Radiological Society of North America, Inc.
Contact
moc.duolci@tdimhcs.nebuer
Funding
Funding for project development was provided by the Western Health Department of Medical Imaging.
Ethical review
Approved by the Western Health Ethics Panel (HREC/23/WH/94984). Informed consent waived for retrospective analysis of de-identified reports.
Date
Updated: 2024-03-01
Published: 2024-01-24
Created: 2023-06-13
References
[1] Schmidt RA, Seah JCY, Cao K, Lim L, Lim W, Yeung J. "Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports". Radiology: Artificial Intelligence. 2024;6(2):e230205. 2024-01-24. doi:10.1148/ryai.230205. PMID: 38265301. PMCID: PMC10982816.
Model
Architecture
Generative large language model (transformer-based; GPT-4 evaluated via API).
Availability
Accessed via OpenAI API (https://openai.com/gpt-4).
Clinical benefit
Automated detection of speech recognition errors in radiology reports to improve report accuracy and quality assurance.
Clinical workflow phase
Clinical decision support systems; workflow optimization; quality assurance/quality improvement.
Degree of automation
Decision support; flags potential errors for radiologist review. Manual inspection prior to sign-off still required.
Indications for use
Automatic detection of speech recognition errors in de-identified radiology reports (CT and MRI) within a tertiary hospital setting.
Input
De-identified radiology report text (Findings and Conclusion sections).
Instructions
Prompts optimized using chain-of-thought, few-shot examples, grounding context, and temperature adjustments; three outputs averaged for final classification.
Limitations
Single-institution, retrospective study; small dataset for prompt validation; potential prompt overfitting; comparisons may become outdated as models evolve; compliance/privacy risks when using third-party systems; suggested corrections were not evaluated—only detection at the correct location.
Output
CDEs: RDE456, RDE123
Description: Flags locations of speech recognition errors within reports and proposes corrected text; study evaluation used detection at correct locations as the endpoint.
Recommendation
Advanced generative LLMs, particularly GPT-4, show potential for integration into radiology workflows to assist with error detection, especially for longer reports, trainee dictations, and overnight shifts. Human review remains necessary.
Reproducibility
Prompts were iteratively engineered (100 iterations per model) and validated on a 100-report subset; three model outputs averaged per report for classifications. No public code or model weights provided.
Use
Intended: Decision support, Detection
Out-of-scope: Report generation
Excluded: Other
User
Intended: Radiologist
Out-of-scope: Patient
Excluded: Layperson