BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports
2026-01-24https://doi.org/10.1148/atlas.1769272103853
20
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports
Link
https://dx.doi.org/10.1148/ryai.220097
Indexing
Keywords: radiology reports, Japanese, sentence-level classification, anatomic classification, PET/CT, transfer learning, BERT, natural language processing
Content: IN, CT, NM
RadLex: RID5945, RID49439, RID10341
Author(s)
Daiki Nishigaki
Yuki Suzuki
Tomohiro Wataya
Kosuke Kita
Kazuki Yamagata
Junya Sato
Shoji Kido
Noriyuki Tomiyama
Organization(s)
Osaka University Graduate School of Medicine, Departments of Artificial Intelligence Diagnostic Radiology and Radiology
Medical Imaging Clinic
License
Text: © 2023 Radiological Society of North America, Inc.
Contact
Corresponding author: Shoji Kido (email: dev@null)
Funding
Japan Society for the Promotion of Science (JSPS) KAKENHI grant no. JP21H03840.
Ethical review
Retrospective study approved by the institutional review boards of Osaka University Hospital and Medical Imaging Clinic; informed consent waived.
Date
Published: 2023-02-15
References
[1] Nishigaki D, Suzuki Y, Wataya T, Kita K, Yamagata K, Sato J, Kido S, Tomiyama N. "BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports". Radiology: Artificial Intelligence. 2023-02-01. doi:10.1148/ryai.220097. PMID: 37035437. PMCID: PMC10077075.
Dataset
Motivation
Develop an automated system to perform anatomic classification of free-text radiology reports at the sentence level, including minority classes with few examples.
Sampling
From 86,247 PET/CT reports (Dec 2005–Dec 2020) at a single clinic, 900 reports were randomly selected to ensure at least 50 test sentences in the least frequent class.
Partitioning scheme
900 PET/CT reports randomly selected and split 1:4 into training (approximately 180 reports) and test (approximately 720 reports). Sentences were then annotated into seven anatomic classes.
Missing information
Public availability, exact counts per class in each split beyond summarized tables, and any de-identification protocol details beyond normalization not provided.
Relationships between instances
Multiple sentences per report; each sentence labeled with a single anatomic class. Some patients had multiple reports (test set: 715 patients for 720 reports).
Noise
Original reports contained boilerplate text; duplicate sentences (primarily boilerplate) were removed (3594 sentences).
External data
Model pretraining used UTH-BERT, which was pretrained on a large Japanese clinical corpus (external resource).
Confidentiality
Free-text clinical radiology reports; personally identifiable information was normalized in preprocessing; dataset not publicly released in the article.
Sensitive data
Clinical text includes PHI prior to preprocessing; numbers and personally identifiable information were converted to underscores during preprocessing.