BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports
dataset2026-01-24https://doi.org/10.1148/atlas.1769272103853
20

Overview

Schema Version

https://atlas.rsna.org/schemas/2025-11/dataset.json

Name

BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports

Link

https://dx.doi.org/10.1148/ryai.220097

Indexing

Keywords: radiology reports, Japanese, sentence-level classification, anatomic classification, PET/CT, transfer learning, BERT, natural language processing
Content: IN, CT, NM
RadLex: RID5945, RID49439, RID10341

Author(s)

Daiki Nishigaki
Yuki Suzuki
Tomohiro Wataya
Kosuke Kita
Kazuki Yamagata
Junya Sato
Shoji Kido
Noriyuki Tomiyama

Organization(s)

Osaka University Graduate School of Medicine, Departments of Artificial Intelligence Diagnostic Radiology and Radiology
Medical Imaging Clinic

License

Text: © 2023 Radiological Society of North America, Inc.

Contact

Corresponding author: Shoji Kido (email: dev@null)

Funding

Japan Society for the Promotion of Science (JSPS) KAKENHI grant no. JP21H03840.

Ethical review

Retrospective study approved by the institutional review boards of Osaka University Hospital and Medical Imaging Clinic; informed consent waived.

Date

Published: 2023-02-15

References

[1] Nishigaki D, Suzuki Y, Wataya T, Kita K, Yamagata K, Sato J, Kido S, Tomiyama N. "BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports". Radiology: Artificial Intelligence. 2023-02-01. doi:10.1148/ryai.220097. PMID: 37035437. PMCID: PMC10077075.

Dataset

Motivation

Develop an automated system to perform anatomic classification of free-text radiology reports at the sentence level, including minority classes with few examples.

Sampling

From 86,247 PET/CT reports (Dec 2005–Dec 2020) at a single clinic, 900 reports were randomly selected to ensure at least 50 test sentences in the least frequent class.

Partitioning scheme

900 PET/CT reports randomly selected and split 1:4 into training (approximately 180 reports) and test (approximately 720 reports). Sentences were then annotated into seven anatomic classes.

Missing information

Public availability, exact counts per class in each split beyond summarized tables, and any de-identification protocol details beyond normalization not provided.

Relationships between instances

Multiple sentences per report; each sentence labeled with a single anatomic class. Some patients had multiple reports (test set: 715 patients for 720 reports).

Noise

Original reports contained boilerplate text; duplicate sentences (primarily boilerplate) were removed (3594 sentences).

External data

Model pretraining used UTH-BERT, which was pretrained on a large Japanese clinical corpus (external resource).

Confidentiality

Free-text clinical radiology reports; personally identifiable information was normalized in preprocessing; dataset not publicly released in the article.

Sensitive data

Clinical text includes PHI prior to preprocessing; numbers and personally identifiable information were converted to underscores during preprocessing.