Imaging data sets (artificial intelligence)
Updates to Article Attributes
The aggregation of an imaging data set is a critical step in building artificial intelligence (AI) for radiology. Imaging data sets are used in various ways including training and/or testing algorithms. Many data sets for building convolutional neural networks for image identification involve at least thousands of images but smaller data sets are useful for texture analysis, transfer learning, and other programs.
Many commercial AI products are built on proprietary data sets or specific hospital data sets not available due to concerns over patient privacy. There are however several imaging data sets of radiological images and/or reports publicly available at the following websites:
- ACR Data Science: list of ~20 data sets
- CheXpert: 224,316 chest radiographs
- Computed Tomography Emphysema Database small images specifically for texture analysis
- The Medical Image Bank of Valencia
- MD.ai: a collection of public projects
- OpenI - The Open Access Biomedical Image Search Engine: data sets search engine, API (application programmer interface) to create customized data sets available at MedPix
- OpenNeuro: list of over 200 neuro data sets
- OASIS: open access neuro data sets
- Spineweb 16 spinal imaging data sets
- UCLH Stroke EIT Dataset
- MRNet: 1,370 annotated knee MRI examinations
- MURA: a large dataset of musculoskeletal radiographs
- MIMIC-CXR Database: 377,110 chest radiographs with free-text radiology reports
- York Cardiac MRI Dataset cardiac MRIs
- Zenodo searchable projects
Additionally, The Cancer Imaging Archive contains links to many open radiology data sets including the following:
- 4D-Lung
- ACRIN-FLT-Breast
- ACRIN-FLT-Breast
- ACRIN-FMISO-Brain
- ACRIN-NSCLC-FDG-PET
- Anti-PD-1 Immunotherapy Lung (Anti-PD-1_Lung)
- Anti-PD-1 Immunotherapy Melanoma (Anti-PD-1_MELANOMA)
- APOLLO-1-VA
- APOLLO2
- Brain-Tumor-Progression
- BREAST-DIAGNOSIS
- Breast-MRI-NACT-Pilot
- CBIS-DDSM
- CPTAC-AML
- CPTAC-CCRCC
- CPTAC-CM
- CPTAC-GBM
- CPTAC-HNSCC
- CPTAC-LSCC
- CPTAC-LUAD
- CPTAC-PDA
- CPTAC-SAR
- CPTAC-UCEC
- Credence Cartridge Radiomics Phantom CT Scans
- Credence Cartridge Radiomics Phantom CT Scans with Controlled Scanning Approach (CC-Radiomics-Phantom-2)
- CT COLONOGRAPHY
- CT Lymph Nodes
- Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT)
- Head-Neck Cetuximab
- Head-Neck-PET-CT
- ISPY1
- Ivy GAP
- LGG-1p19qDeletion
- LIDC-IDRI
- LungCT-Diagnosis
- Lung CT Segmentation Challenge 2017
- Lung Phantom
- Mouse-Astrocytoma
- Mouse-Mammary
- NaF Prostate
- NRG-1308
- NSCLC-Cetuximab
- NSCLC Radiogenomics
- NSCLC-Radiomics
- NSCLC-Radiomics-Genomics
- Osteosarcoma data from UT Southwestern/UT Dallas for Viable and Necrotic Tumor Assessment
- Pancreas-CT
- Phantom FDA
- Prostate-3T
- PROSTATE-DIAGNOSIS
- Prostate Fused-MRI-Pathology
- PROSTATE-MRI
- QIBA CT-1C
- QIN-BRAIN-DSC-MRI
- QIN-Breast
- QIN Breast DCE-MRI
- QIN GBM Treatment Response
- QIN-HEADNECK
- QIN LUNG CT
- QIN PET Phantom
- QIN PROSTATE
- QIN-PROSTATE-Repeatability
- QIN-SARCOMA
- Quantitative Imaging Network Collections
- REMBRANDT
- RIDER Breast MRI
- RIDER Collections
- RIDER Lung CT
- RIDER Lung PET-CT
- RIDER NEURO MRI
- RIDER PHANTOM MRI
- RIDER Phantom PET-CT
- Soft-tissue-Sarcoma
- SPIE-AAPM Lung CT Challenge
- SPIE-AAPM-NCI PROSTATEx Challenges
- Synthetic and Phantom MR Images for Determining Deformable Image Registration Accuracy (MRI-DIR)
- TCGA-BLCA
- TCGA-BRCA
- TCGA-CESC
- TCGA-COAD
- TCGA-ESCA
- TCGA-GBM
- TCGA-HNSC
- TCGA-KICH
- TCGA-KIRC
- TCGA-KIRP
- TCGA-LGG
- TCGA-LIHC
- TCGA-LUAD
- TCGA-LUSC
- TCGA-OV
- TCGA-PRAD
- TCGA-READ
- TCGA-SARC
- TCGA-STAD
- TCGA-THCA
- TCGA-UCEC
-<a title="Computed Tomography Emphysema Database" href="http://image.diku.dk/emphysema_database/">Computed Tomography Emphysema Database </a>small images specifically for texture analysis</li>- +<a href="http://image.diku.dk/emphysema_database/">Computed Tomography Emphysema Database </a>small images specifically for texture analysis</li>
-<a href="https://openi.nlm.nih.gov/">OpenI - The Open Access Biomedical Image Search Engine</a>: data sets search engine </li>- +<a href="https://openi.nlm.nih.gov/">OpenI - The Open Access Biomedical Image Search Engine</a>: data sets search engine, API (application programmer interface) to create customized data sets available at <a title="MedPix" href="https://medpix.nlm.nih.gov">MedPix </a>
- +</li>