fastMRI: A Publicly Available Raw k-Space and DICOM Dataset of Knee Images for Accelerated MR Image Reconstruction Using Mach...
fastMRI: A Publicly Available Raw k-Space and DICOM
Dataset of Knee Images for Accelerated MR Image
Reconstruction Using Machine Learning
2025-11-29https://doi.org/10.1148/atlas.1764457735058
192
Overview
Schema Version
https://atlas.rsna.org/schemas/2025-11/dataset.json
Name
fastMRI: A Publicly Available Raw k-Space and DICOM Dataset of Knee Images for Accelerated MR Image Reconstruction Using Mach...
Link
https://doi.org/10.1148/ryai.2020190007
Indexing
Keywords: fastMRI dataset, MR image reconstruction, k-space data, DICOM image data, knee MRI examinations, accelerated MRI, machine learning, reproducibility, scan time reduction, image quality
Content: MR, MK
Author(s)
Florian Knoll
Jure Zbontar
Anuroop Sriram
Matthew J. Muckley
Mary Bruno
Aaron Defazio
Marc Parente
Krzysztof J. Geras
Joe Katsnelson
Hersh Chandarana
Zizhao Zhang
Michal Drozdzalv
Adriana Romero
Michael Rabbat
Pascal Vincent
James Pinkerton
Duo Wang
Nafissa Yakubova
Erich Owens
C. Lawrence Zitnick
Michael P. Recht
Daniel K. Sodickson
Yvonne W. Lui
Organization(s)
NYU School of Medicine
Facebook Artificial Intelligence Research
New York University Center for Data Science
University of Florida
License
Text: Non-commercial use, research and educational purposes, data sharing agreement required.
Contact
Yvonne.Lui@nyulangone.org
Funding
National Institutes of Health grants R01EB024532 and P41EB017183.
Ethical review
Curation of the dataset was part of a study approved by our local institutional review board.
Comments
The fastMRI dataset is the first large-scale public dataset that includes raw k-space data and DICOM data from a clinical population, tailored for MR image reconstruction research using machine learning. It aims to accelerate research, enable large-scale validation, and enhance reproducibility in the field.
Date
Published: 2020-01-01
Dataset
Motivation
The purpose of the fastMRI dataset is to provide the first step toward addressing the lack of a large-scale public dataset that includes raw k-space data for MR image reconstruction. It aims to provide a resource to improve image acquisition and reconstruction itself using machine learning techniques.
Sampling
The k-space dataset consists of fully sampled data from 1594 consecutive clinical MRI proton density–weighted acquisitions of the knee. The DICOM dataset includes 10012 consecutive clinical knee MRI examinations from 9290 patients, representing a full complement of clinical acquisitions.
Partitioning scheme
The 1594 k-space data examples are partitioned into training, validation, multicoil testing, single-coil testing, and a hold-back set for a challenge. The 10012 DICOM data are not partitioned into training, validation, testing, and challenge sets, but provided as a single dataset for auxiliary training or generalizability testing.
Missing information
The dataset does not provide diagnostic labeling segmentations, text reports, statistics on the prevalence of pathology, information on metal implants, or demographic information.
Relationships between instances
The examples in the training and validation set are identical for the single-coil and multicoil k-space datasets. For the challenge and test set, unique examples are provided for the single-coil and multicoil datasets to prevent information sharing.
Noise
No examinations were excluded owing to presence of imaging artifacts from motion, pulsatile flow, and so forth, for both k-space and DICOM data.
Confidentiality
k-space data were deidentified via conversion to the vendor-neutral International Society for Magnetic Resonance in Medicine (ISMRM) raw data format. DICOM data were deidentified by using the Radiological Society of North America’s clinical trial processor tool. All metadata, as well as the DICOM images themselves, were manually inspected to ensure that no protected health information remained in the dataset.
Re-identification
Not possible due to deidentification of k-space data and DICOM data, manual inspection for PHI, and generation of random integer patient identifiers for DICOM.
Sensitive data
The dataset includes pathologic findings at a rate representative of a clinical patient population, as it consists of consecutive examinations.