Lille University Hospital Emergency Brain MRI Reports (2022) — Vicuna LLM Evaluation Cohort
dataset2025-11-26https://doi.org/10.1148/atlas.1764132213655
32

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

Lille University Hospital Emergency Brain MRI Reports (2022) — Vicuna LLM Evaluation Cohort

Link

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11294959/

Indexing

Keywords: Large Language Model, Vicuna, Information extraction, Radiology reports, Brain MRI, Emergency department, Headache, French reports, Contrast medium, Free text
Content: ER, IN, MR, NR
RadLex: RID10312, RID45946, RID11587, RID39094, RID13060, RID10319, RID49531, RID11595
SNOMED: 25064002

Author(s)

Bastien Le Guellec
Alexandre Lefèvre
Charlotte Geay
Lucas Shorten
Cyril Bruge
Lotfi Hacein-Bey
Philippe Amouyel
Jean-Pierre Pruvo
Gregory Kuchcinski
Aghiles Hamroun

Organization(s)

CHU Lille–Université Lille, Department of Neuroradiology
CHU Lille–Université Lille, Department of Public Health
INclude Health Data Warehouse, CHU Lille
UC Davis Health, Department of Radiology
Université Lille, INSERM, Institut Pasteur de Lille, U1167-RID-AGE
INSERM U1172–LilNCog-Lille Neuroscience & Cognition, Université Lille
UAR 2014-US 41-PLBS–Plateformes Lilloises en Biologie & Santé, Université Lille

License

Text: CC BY 4.0
URL: https://creativecommons.org/licenses/by/4.0/

Funding

Authors declared no funding for this work.

Ethical review

Data warehouse approved by French data protection authority (ref. 2019–103). Study use approved by Lille University Hospital IRB in June 2023 (EDS2307251350).

Comments

Retrospective cohort of pseudonymized free-text emergency brain MRI reports (French) from CHU Lille (France) in 2022, used to evaluate an on-premise open-source LLM (Vicuna 13B) for information extraction tasks.

Date

Published: 2024-05-08
Created: 2022-01-01

References

[1] Le Guellec B, Lefèvre A, Geay C, Shorten L, Bruge C, Hacein-Bey L, Amouyel P, Pruvo JP, Kuchcinski G, Hamroun A. "Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports". Radiology: Artificial Intelligence. 2024-05-08. doi:10.1148/ryai.230364. PMID: 38717292. PMCID: PMC11294959.

Dataset

Motivation

Assess feasibility and performance of an on-premise open-source LLM for extracting clinically relevant information from real-life radiology reports.

Sampling

All consecutive emergency department brain MRI reports from Jan–Dec 2022 at a single French quaternary center; subset with headache identified by radiologist review.

Missing information

Raw report texts are not shared in the article; only translated/modified examples in figures/tables. Imaging files not included.

Relationships between instances

Each instance is a radiology report; reports segmented into sections (clinical context, protocol, results, conclusion).

Noise

Reports authored by 43 radiologists (22 trainees, 21 board-certified) with variable phrasing; some reports lacked explicit mention of contrast use.

External data

No external datasets reported; all data from CHU Lille health data warehouse.

Confidentiality

Pseudonymized free-text reports extracted from institutional health data warehouse; no raw images shared.

Re-identification

Reports were pseudonymized using eHOP software by removing patient residence, name, and prescribing physician.

Sensitive data

Clinical free-text reports containing medical information; identifiers removed prior to analysis.