ChatGPT (GPT-3.5 and GPT-4) for Brazilian Radiology Board Exam Question Answering
2025-11-30https://doi.org/10.1148/atlas.1764531770889
223
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/model.json
Name
ChatGPT (GPT-3.5 and GPT-4) for Brazilian Radiology Board Exam Question Answering
Link
https://dx.doi.org/10.1148/ryai.230103
Indexing
Keywords: ChatGPT, Artificial Intelligence, Board Examinations, Radiology and Diagnostic Imaging, Mammography, Neuroradiology
Content: ED, IN, OT
RadLex: RID13060, RID10357
Author(s)
Leonardo C. Almeida
Eduardo M. J. M. Farina
Paulo E. A. Kuriki
Nitamar Abdala
Felipe C. Kitamura
Organization(s)
Universidade Federal de São Paulo (UNIFESP) – Department of Artificial Intelligence and Management; Graduate Program in Medicine (Clinical Radiology)
AI Lab, Dasa
Version
1.0
License
Text: © 2023 by the Radiological Society of North America, Inc.
Contact
rb.psefinu@alenac.odranoel
Funding
Authors declared no funding for this work.
Ethical review
This prospective exploratory study did not include any human subjects or patient data and was not required to get approval from the institutional review board.
Date
Updated: 2024-01-01
Published: 2023-11-08
Created: 2023-04-01
References
[1] Almeida LC, Farina EMJM, Kuriki PEA, Abdala N, Kitamura FC. "Performance of ChatGPT on the Brazilian Radiology and Diagnostic Imaging and Mammography Board Examinations". Radiology: Artificial Intelligence. 2024 Jan;6(1):e230103. 2023-11-08. doi:10.1148/ryai.230103. PMID: 38294325. PMCID: PMC10831524.
Model
Architecture
Large language model based on transformer architecture (OpenAI GPT-3.5 and GPT-4).
Availability
Accessed through OpenAI official chat completion API; maximum tokens set to 2048 and temperature set to 0.5 (as used in the study).
Clinical benefit
Educational/assessment use: evaluates ability to answer radiology-related multiple-choice questions; not a clinical diagnostic tool.
Clinical workflow phase
Education; knowledge assessment/benchmarking.
Decision threshold
Passing threshold defined by the examinations: score ≥ 60%.
Degree of automation
Fully automated question answering given text prompts; no human-in-the-loop for answer generation.
Indications for use
Answer multiple-choice questions from Brazilian College of Radiology (CBR) theoretical board examinations (radiology and diagnostic imaging, mammography, and neuroradiology) in a research/benchmarking setting.
Input
Text-only multiple-choice questions (Portuguese) from 2022 CBR theoretical board examinations; five options per question; no image inputs.
Instructions
Zero-shot prompting; five prompt styles tested: raw, brief instruction, long instruction, chain-of-thought ("Let us think about this step by step"), and question-specific automatic prompt generation (QAPG). Each exam per style and model was run five times; median score reported.
Limitations
Evaluation excluded image-based questions; zero-shot only (no few-shot or contextualization); conducted on 2022 exams available online; results depend on prompt style; randomness in LLM outputs mitigated by five repetitions; not a validation for clinical use; Portuguese language questions only.
Output
CDEs: RDE448, RDE442
Description: For each question, the model outputs a selected option (A–E); study aggregates to exam scores (percentage correct) and related statistics.
Recommendation
Use for research/educational benchmarking and prompt engineering exploration; not recommended for clinical decision-making or certification purposes.
Regulatory information
Comment: Study assesses performance; no regulatory submission reported.
Authorization status: Not a regulated medical device; research use in question-answering benchmark.
Reproducibility
Each model/prompt-style combination was executed five times per exam; median score used; statistical tests included Wilcoxon signed rank, Friedman, Nemenyi; observed agreement assessed; temperature fixed at 0.5.
Sustainability
Runtime/energy consumption not reported; API parameters included max tokens 2048 and temperature 0.5.
Use
Intended: Other
Out-of-scope: Decision support, Detection and diagnosis
Excluded: Detection and diagnosis, Other
User
Intended: Other, Researcher
Out-of-scope: Patient
Excluded: Patient