GPT-4 for Detection of Speech Recognition Errors in Radiology Reports
model2025-11-30https://doi.org/10.1148/atlas.1764460867940
122

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/model.json

Name

GPT-4 for Detection of Speech Recognition Errors in Radiology Reports

Link

https://pubmed.ncbi.nlm.nih.gov/38265301/

Indexing

Keywords: CT, MRI, Radiology Reports, Speech Recognition, Large Language Model, Natural Language Processing, Error detection
Content: IN
RadLex: RID10312, RID45910, RID10321

Author(s)

Reuben A. Schmidt
Jarrel C. Y. Seah
Ke Cao
Lincoln Lim
Wei Lim
Justin Yeung

Organization(s)

Department of Medical Imaging, Western Health, Footscray, Australia
Alfred Health, Harrison.ai, Monash University, Clayton, Australia
Department of Surgery, Western Precinct, University of Melbourne, Melbourne, Australia
Department of Surgery, Western Health, Melbourne, Australia

Version

1.0

License

Text: © 2024 Radiological Society of North America, Inc.

Contact

moc.duolci@tdimhcs.nebuer

Funding

Funding for project development was provided by the Western Health Department of Medical Imaging.

Ethical review

Approved by the Western Health Ethics Panel (HREC/23/WH/94984). Informed consent waived for retrospective analysis of de-identified reports.

Date

Updated: 2024-03-01
Published: 2024-01-24
Created: 2023-06-13

References

[1] Schmidt RA, Seah JCY, Cao K, Lim L, Lim W, Yeung J. "Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports". Radiology: Artificial Intelligence. 2024;6(2):e230205. 2024-01-24. doi:10.1148/ryai.230205. PMID: 38265301. PMCID: PMC10982816.

Model

Architecture

Generative large language model (transformer-based; GPT-4 evaluated via API).

Availability

Accessed via OpenAI API (https://openai.com/gpt-4).

Clinical benefit

Automated detection of speech recognition errors in radiology reports to improve report accuracy and quality assurance.

Clinical workflow phase

Clinical decision support systems; workflow optimization; quality assurance/quality improvement.

Degree of automation

Decision support; flags potential errors for radiologist review. Manual inspection prior to sign-off still required.

Indications for use

Automatic detection of speech recognition errors in de-identified radiology reports (CT and MRI) within a tertiary hospital setting.

Input

De-identified radiology report text (Findings and Conclusion sections).

Instructions

Prompts optimized using chain-of-thought, few-shot examples, grounding context, and temperature adjustments; three outputs averaged for final classification.

Limitations

Single-institution, retrospective study; small dataset for prompt validation; potential prompt overfitting; comparisons may become outdated as models evolve; compliance/privacy risks when using third-party systems; suggested corrections were not evaluated—only detection at the correct location.

Output

CDEs: RDE456, RDE123
Description: Flags locations of speech recognition errors within reports and proposes corrected text; study evaluation used detection at correct locations as the endpoint.

Recommendation

Advanced generative LLMs, particularly GPT-4, show potential for integration into radiology workflows to assist with error detection, especially for longer reports, trainee dictations, and overnight shifts. Human review remains necessary.

Reproducibility

Prompts were iteratively engineered (100 iterations per model) and validated on a 100-report subset; three model outputs averaged per report for classifications. No public code or model weights provided.

Use

Intended: Decision support, Detection
Out-of-scope: Report generation
Excluded: Other

User

Intended: Radiologist
Out-of-scope: Patient
Excluded: Layperson