Global Coronavirus Datasets

Anushka Sandesara
Analytics Vidhya
Published in
4 min readJul 15, 2020

For those who are looking to track the rate of spread, or conduct other research about the virus, numerous datasets have been made available through public and paid platforms. This story will highlight most of the widely used coronavirus datasets.

Global Coronavirus Datasets

  1. 2019 Coronavirus Data:- This dataset is simple reformatting of the John Hopkins University dataset into organized CSV files. It consists of 9 CSV files containing data from different countries including China, the United States, and Australia.
  2. 150 Million Covid-19 Tweets:-This is a dataset containing over 150 million tweets related to COVID-19, beginning from March 11th, 2020. The tweets are an amalgam of all languages with English, Spanish, and French being most relevant.
  3. COVID-19 cases book:-This downloadable workbook includes a starter dashboard as well as an embedded connection to trusted COVID-19 activity data. This data is sourced from the European Centre for Disease Prevention and Control as well as The New York Times (who is aggregating data from state and local governments and health departments for the United States)
  4. Novel Coronavirus Data:- The data is compiled by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from various sources including the World Health Organization (WHO). JHU CCSE maintains the data on the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository on Github.
  5. Coronavirus Genome:- It is a simple TXT file containing the complete COVID-19 genome sequence.
  6. The Comprehensive COVID dataset: -Multiple CSV files containing worldwide Coronavirus case data that is updated every 24 hours. There are 6 different CSV files that contain data country wise with information on Total Deaths, Total Recovered, Active Cases, Serious Situations, and Total Tests conducted.
  7. (COVID-19) cases worldwide:-This dataset contains the number of novel Coronavirus cases divided by country. This dataset is updated regularly.
  8. Dimensions COVID:-This repository contains all clinical trials, publications, and datasets relevant to Coronavirus from the Dimensions scholarly research database.
  9. CORD-19:-From the Allen Institute for AI, CORD-19 is an open dataset consisting of over 45,000 scholarly articles about Coronavirus.It is a free resource of over 29,000 scholarly articles, including over 13,000 with full text about Coronavirus family.
  10. COVID-19 Tweets:-The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2).

India Coronavirus Datasets

  1. COVID-19 Data:- This dataset comprises of individual patient data, number of hospital beds, and cases with recent information. There are 7 different CSV files that have information on Age Details, Hospital Beds, Statewise Testing Details, and ICMR Testing Labs.
  2. Coronavirus Data India:- This dataset contains day-to-day state wise number of cases, raw patient data, latest numbers are updated too. Various state-wise tests conducted, day-wise, nation-level, district-level data is also made available.

United States Coronavirus Datasets

  1. USA Covid-19 Daily Cases:-From Harvard University, this dataset contains daily COVID-19 cases with a United States base map that includes state and county-level data.
  2. COVID-19 USA:-This includes information on confirmed Coronavirus cases in the United States.
  3. Tracking Corona:-From BNO News, this resource contains the map data and timeline information for COVID-19 cases in the United States. Various state-wise COVID data is available and updates take place regularly so the latest data is also available easily.
  4. NY Dataset:- The New York Times has made one of the most comprehensive datasets of coronavirus cases in the United States publicly available in response to requests from researchers, scientists, and government officials. It is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time.

Italy Coronavirus Datasets

  1. Cases By Region-This includes comprehensive data on Coronavirus cases by region.
  2. Cases By Deaths:-This represents Coronavirus deaths by region.
  3. Regional COVID data:-It includes datasets of cumulative COVID-19 data per region (hospitalized with symptoms, in intensive care, total hospitalized, home isolation, total positives, total change positive, new positives, discharged healed, deceased, total cases).
  4. HARVARD dataset:- Published by the most reputed university of the United States it comprises of Hospitalised patients with symptoms, Home confinement, Total amount of current positive cases, Recovered Patients, Deaths, Total amount of positive cases and Tests performed.

Canada Datasets

  1. Tracking Corona Data:-From BNO News, this dataset includes the map data and timeline information for COVID-19 cases in Canada. It comprises of information on cases, new cases, deaths, recovered, critical and serious patients.

Australia Datasets

  1. COVID-19 cases by location:- The data is for confirmed COVID-19 cases only based on the location of usual residence, not necessarily where the virus was contracted.
  2. COVID-19 tests by location:- The data is for COVID-19 tests and is based on where a person has been tested and is undergoing public health management at the time of the test.
  3. COVID-19 cases by infection:- COVID-19 cases updated by likely source of infection. This dataset is updated daily, except on weekends.

Germany Datasets

  1. COVID-19 Germany:- This dataset covers the number of Coronavirus cases reported in Germany. A number of daily recorded cases with dates are mentioned.

China Datasets

  1. China Regions:-This simple dataset contains GeoJSON data for regions in China. It can be used to help display Coronavirus cases in China by region
  2. Fatality Rate China: -A small dataset that shows the fatality rate of COVID-19 in China as of February 2020.
  3. Deaths And Recovered Data:-This dataset can be downloaded in XLS or PPT format and includes the number of novel coronavirus infection, death, and recovery cases in China by region. It is updated regularly.
  4. Age Distribution Of Cases:-This graph shows the age distribution of Coronavirus patients in China as of February 2020.
  5. Gender Distribution Of Cases: -A simple dataset showing the gender distribution of Coronavirus patients in China as of February 2020.

