

Internetting for research: How do I find genomic data to download?
If you would ask me this before the internet times, my answer would be simply: don’t download, it is too slow — get a disk instead, but before the internet I would hardly have any suggestion on where to find data.
Today, internet speeds are fast, and data is everywhere — you just have to find it.
And this is where you will find that signposting of data is extremely helpful. Depending on what category of data you are after, a general internet search may be helpful to an extent, but you are much more likely to reach your desired result if you search on a repository specific to the type of data that you are after.
Below you will find the content of Table 1. A list of repositories where researchers can download or upload genomic data. From our publication in PLoS Biology: DNAdigest and Repositive: Connecting the World of Genomic Data, Kovalevskaya et al
To make your search even easier, we are indexing all the data sources of raw sequencing data on the free Repositive platform: http://repositive.io
What other data sources do you find useful for your research in cancer or rare diseases? Let me know in the comments and together we can expand this list.
Cheers,
-Fiona-
dbGaP
Raw sequence data & phenotypic data
Database of Genotypes and Phenotypes, developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype.
http://www.ncbi.nlm.nih.gov/gap
dbVar
Variant data
Database of genomic structural variation — it contains insertions, deletions, duplications, inversions, multinucleotide substitutions, mobile element insertions, translocations, and complex chromosomal rearrangements
http://www.ncbi.nlm.nih.gov/dbvar
dbSNP
Variant data
Database of single nucleotide polymorphisms (SNPs) and multiple small-scale variations that include insertions/deletions, microsatellites, and non- polymorphic variants
http://www.ncbi.nlm.nih.gov/snp
GEO
Raw sequencing data
Public functional genomics data repository supporting MIAME-compliant data submissions. Tools are provided to help users query and download experiments and curated gene expression profiles.
http://www.ncbi.nlm.nih.gov/geo/
Sequence Read Archive (SRA)
Raw sequencing data
Stores raw sequencing data and alignment information from high-throughput sequencing platforms.
http://www.ncbi.nlm.nih.gov/sra
ClinVar
Variant data
Aggregates information about genomic variation and its relationship to human health.
http://www.ncbi.nlm.nih.gov/clinvar/
The European Genome-phenome Archive (EGA)
Raw sequence data & phenotypic data
Allows you to explore datasets from genomic studies, provided by a range of data providers
The European Nucleotide Archive (ENA)
Raw sequencing data
A comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.
The European Variation Archive
Variant data
An open-access database of all types of genetic variation data from all species.
(EVA) ArrayExpress
Raw sequencing data
Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.
https://www.ebi.ac.uk/arrayexpress/
DNA data bank of Japan (DDBJ)
Raw sequencing data
Collects nucleotide sequence data as a member of INSDC and provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science.
Japanese Genotype- phenotype Archive (JGA)
Raw sequencing data
A service for permanent archiving and sharing of all types of individual-level genetic and de-identified phenotypic data resulting from biomedical research projects. The JGA contains exclusive data collected from individuals whose consent agreements authorize data release only for specific research use or to bona fide researchers.
https://trace.ddbj.nig.ac.jp/jga/index_e.html
Catalogue of somatic mutation in cancer (COSMIC)
Variant data
Stores and displays somatic mutation information and related details and contains information relating to human cancers. There are two types of data in COSMIC: Expert manual curation data and systematic screen data.
http://cancer.sanger.ac.uk/cosmic
DECIPHER
Variant data & phenotypic data
Database contains data from >17800 patients who have given consent for broad data-sharing. Used by the clinical community to share and compare phenotypic and genotypic data.
Figshare
Raw sequencing data
A repository where users can make all of their research outputs available in a citable, shareable and discoverable manner
Dryad
Raw sequencing data
A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of datatypes.
LOVD
Variant data
A free, flexible web-based open source database developed designed to collect and display variants in the DNA sequence.
GigaDB
Raw sequencing data
Associated with the journal GigaScience, contains discoverable, trackable, and citable datasets that are available for public download and use.
The Autism Genetic Resource Exchange (AGRE)
Variant data & phenotypic data
A repository of biomaterials and phenotypic and genotypic data to aid research on autism spectrum disorders.
Genomes unzipped (GNZ)
Raw sequencing data
A collaborative project aiming to provide genetic testing customers with the knowledge and tools they need to make the most of their own genetic data. As part of the project members are taking commercial genetic tests and making the raw data publicly available for others to download, analyse and reuse.
OpenSNP
Raw sequencing data
Allows induviduals to publish their genetic test results, find others with similar genetic variations, learn more about their results, get the latest primary literature on their variations and help scientists find new associations.
—
In my day job I run a charity (DNAdigest) and a company (Repositive) where everything we do is about making efficient use of research data to have the most positive impact for research in health and disease.