Starschema Blog
Published in

Starschema Blog

The COVID Tracking Project is shutting down in a week. What next?

The COVID Tracking Project has been one of the most successful citizen-driven data collection projects in history. Driven by The Atlantic and supported by an army of volunteers, it has collected the nuggets of information about testing and case counts, often beating federal and state authorities to the race. Yet sustaining such a project over the long run, especially when primarily driven by volunteer engagement, is quite difficult. And so, after a year, the COVID Tracking Project is shutting down on 07 March.

The good news is that if you have been accessing the COVID Tracking Project’s data via the Starschema COVID-19 Data Set, whether through Snowflake Data Marketplace or through the flat files shared via AWS S3, you should see replacement tables emerging that cover the same ground. Below is a brief overview of data sources that you can use to replace data from the COVID Tracking Project.

Case counts and bed utilization

We have integrated several data products from the US Department of Health and Human Services and the CDC to provide information about case counts and healthcare resource utilization. We provide the following data sets:

  • CDC_INPATIENT_BEDS_ALL: all-cause inpatient bed usage, forecast (including upper and lower bounds of the 95% confidence interval)
  • CDC_INPATIENT_BEDS_COVID_19: inpatient bed usage for COVID-19 cases, forecast (including upper and lower bounds of the 95% confidence interval)
  • CDC_INPATIENT_BEDS_ICU_ALL : all-cause intensive care bed usage, forecast (including upper and lower bounds of the 95% confidence interval)
  • CDC_REPORTED_PATIENT_IMPACT : reported (actual) patient impact data, including critical care availability, admissions for suspected and confirmed COVID-19 by paediatric vs adult cohort and a number of other indicators

Policy measures

A new table, CDC_POLICY_MEASURES, includes highly granular data on a range of policy measures for counties and states, including:

  • stay-at-home orders,
  • large gathering bans,
  • restrictions on specific industry sectors and venues, and
  • non-essential workers legislation.

This supplements our table KFF_US_STATE_MITIGATIONS, which only lists state-level measures. Both tables are updated ‘as needed’, but are constantly monitored for changes.

Testing and diagnostics

The CDC_TESTING table contains information about tests conducted, including positive, negative and pending/inconclusive tests. Even as the emphasis of public health interventions moves from testing to prophylaxis by way of vaccines, diagnostic testing remains an important bellwether for pathogenic dynamics. Like all CDC data, this is updated every day, although there may be an up to 24 hour lag due to reporting times by county, state and territorial health authorities.

Vaccination

With the availability of two vaccines on the US market and the recent positive endorsement of the Johnson & Johnson single-dose vaccine by ACIP, the spotlight is now on vaccine penetration––the ratio of the population who have been vaccinated as a fraction of the entire population.

It is crucial to track vaccination rates for a thorough assessment of a population’s at-risk status. The OWID_VACCINATIONS table provides detailed information on vaccine allocation, vaccines administered and, most importantly, the number of persons who have received two doses of the currently marketed two-dose vaccines (both Pfizer and Moderna), who can be deemed to be immune.

Table updates

We have also made some back-end updates to tables. In particular, if you have been using WHO data, this should come as good news to you––because of the frequent format changes, WHO data was from time to time less reliable. This has now been fixed, and you can look forward to a steady stream of the WHO’s regular reports.

Next steps

If you have been using the COVID Tracking Project data set to power your visualizations and analytics, you may have to make some changes. We have endeavoured to bring these changes to you in due time to allow you to make the changes you need. As always, if we can assist you in any way to make the most out of the Starschema COVID-19 Data Set, please don’t hesitate to let us know.

Data contains intelligence that can change the world — we help people discover, manage and use this intelligence.

Recommended from Medium

How Seeq enables the Practice of MLOps for Continuous Integration and development of the Machine…

Day 20–60 days of Data Science and Machine Learning

Vector in Machine learning

Big Data — What Is It Really?

When Will Small Business in NYC Recover from the Pandemic?

Comments on blogs

Optimizing our delivery process with Data Science

Multi-Step LSTM Time Series Forecasting

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Chris von Csefalvay

Chris von Csefalvay

VP of Special Projects at Starschema, clinical computational epidemiologist, rower, Golden Retriever dad, Fellow of the Royal Society for Public Health.

More from Medium

The “0 / 1 / Done” Strategy for Data Science

Sunset near the News building in London Bridge, London, UK

A Holistic View of Artificial Intelligence — The State of the Art, Risks, Impact on Businesses and…

What I Wish I knew: Reflecting on my short time out of college as a Data Scientist

Semantic Layer — To build or not to build that is the question!