Photo by Adam Nieścioruk on Unsplash

A Data Scientist’s Guide to Navigating COVID-19 Response Projects

by Chinmay Palande

Opex Analytics
8 min readJun 19, 2020

--

If you are a data scientist in a business that was recently impacted by COVID-19 (read: all businesses), you’ve probably been tasked with or tempted to work on a coronavirus response project. Over the last few months, the number of resources related to COVID-19 has skyrocketed. From the original situation reports by the WHO and Johns Hopkins’s collection of resources to the mobility data made available by Apple and Google, there is ample raw material from which to extract insight.

Consequently, scientists, experts, and other interested parties have contributed a lot to our collective understanding in the interim. This curated list of Tableau dashboards and this blog post summarizing popular COVID-19 visuals are great resources for different visualizations. Many experienced researchers have modeled the spread of COVID-19, with plenty of input from epidemiologists and statistics/machine learning professionals. Some of these resources might suit your needs or meaningfully inform your work, so check them out before fitting another model or building another dashboard.

Given the work that’s already out there, you may be unsure of how you can best support your organization. Everyone else has already done work with the publicly available data — your advantage is the access you have to your organization’s proprietary data. By combining external resources and your business’s internal data, you are in a unique position to help your company adapt.

In this post, I’ll discuss some lessons I’ve learned about quickly contributing to your organization’s COVID-19 response strategy.

Photo by Markus Spiske on Unsplash

Identify Relevant Resources

Curating a list of data sources relevant to your business could be a great starting point for a COVID-19 response strategy.

When looking for external data, a good place to start is with information aggregators like the COVID-19 Healthcare Coalition and Corona Data Scraper. From here, you can find and access resources relevant to your business. For example, if you work in the shipping industry, there are websites that keep track of border closures and port statuses across the globe.

For internal data, consult your colleagues to identify all internal data sources that might be useful. Information on closed stores or factories, counts of employees infected at each of your facilities, and sales/shipment/demand history could be valuable inputs.

When considering whether to add a given data source to your final list, you need to think through its usability and reliability. In the early days of the pandemic, many data sources claimed to have accurate data on COVID-19 cases, but as time went on and the data scaled dramatically, not all of them survived. Before you include any data in your analysis, make sure it comes from a reputable, reliable source that is actively maintained.

Furthermore, ensure that you only use data sources for their legal purposes. Most good public data sources come with a “Terms of Use” section on their websites — this is what you need to scrutinize. For example, the Johns Hopkins COVID-19 data repository explicitly mentions that the use of their data for commercial reasons is strictly prohibited, whereas Our World in Data provides their data with a Creative Commons license.

Photo by boostinjay on Unsplash

Combine COVID-19 Data with Proprietary Data for Your Business

Combining COVID-19 information with your business’s internal data is a relatively easy yet underappreciated way inform your organization’s response. If you know whether your shipments are positively or negatively correlated with COVID-19 cases, you can probably create more accurate forecasts (more on that below). Examining how the COVID-19 outbreak within a region impacted closure of or exposures within your facilities could result in more informed short-term decisions about company operations. Putting the two data sets together might reveal trends that surprise you!

To combine internal and COVID-19 data, you have to ensure that your sources link up correctly, by both time and location, with minimal sacrifice of detail. Most COVID-19 data is broken up by day, so wherever possible, aggregate your business data by day as well. This shouldn’t be too difficult.

Spatial joining, however, might be a different story. Many of the main US-based COVID-19 data sources are gathered at the county level. However, businesses typically use their own custom geographical regions to collect and analyze their data.

One easy way to overcome this is to aggregate everything at a national level. However, you may lose valuable information that only exists at the state or county level, like local trends. The best way to maximize information retention is, of course, to combine the data at the most granular level available. For example, if you have zipcode-level information on the outlets you operate, you can use crosswalk files provided by HUD to map zipcodes to counties. (N.B. Keep in mind that zipcodes often contain multiple counties — you may need to make some more assumptions to get to a good mapping.)

On the other hand, if you have precise location information (longitude and latitude) for outlets you serve/facilities you operate, you can figure out which counties contain each of your locations, and then map county-level external variables accordingly. A combination of open-source resources like PostGIS and spatial files provided by the Census Bureau are useful for this. (Check out this easy tutorial to get started with PostGIS to get started with spatial databases.)

Photo by Wesley Tingey on Unsplash

Gauge the Impact of COVID-19 on Your Business

Before visualizing or predicting anything, you might want to start by assessing the impact of COVID-19 on your business.

There have been many reports of empty shelves and interrupted supply chains. If you’re a product company, you might have experienced something like the once-omnipresent toilet paper shortages. But exactly how many stores had been affected? Precisely when did the spike in demand start? Has it come down back to normal, or has it dipped to abnormally low levels as customers consume their stockpile?

These disruptions are bound to have distinct impacts on different products, product types, locations, and channels.

To focus your damage-control efforts, you need an easy way to figure out which segments of your business have been hit the hardest. A good way to start is with a simple year-over-year (YoY) calculation to determine which items have been affected by the disruption. If YoY sales have gone up for any combination of product-channel-region by more than some threshold (maybe ~30% over the last month), you can consider them affected and take appropriate action (e.g., increase supply, increase order frequency, etc.). However, this simple analysis has its limits — this approach could fail if the underlying data has different trends at greater levels of detail (whether time-based or by geography), which might throw off the YoY numbers.

Another way to determine the severity of a disruption is to look at the difference between forecasts and actual sales, and designate items with higher-than-usual forecast errors as affected. One upside of this approach is that it accounts for underlying trends as long your forecasts account for them as well. However, this method relies on a strong assumption that your regular forecasts are still reliable in the post-COVID world. In times of disruption, typical forecasting methods often provide unreliable predictions, for obvious reasons. (This drawback can be partially overcome by assessing forecast error in relative terms [i.e., focusing efforts on product-channel-regions where forecasts and actuals differ most], but it’s far from an exact science.)

You could also consider more involved techniques like change detection. This process is used to detect if and when a structural change in a time series has occurred, and automatically considers the underlying trends in the data. Using when the change occurred, you can improve your short-term forecasts for affected items by training on post-disruption data or overweighting observations after the detected change point. (You can read this paper to get a better understanding of change detection.)

Photo by Vladislav Babienko on Unsplash

Forecast Shipments/Sales/Demand

Forecasting after a major disruption isn’t impossible — just more difficult. You can improve short-term demand forecasts for tactical response with external data. If you can find a data source that correlates with post-disruption patterns in your business, you’re onto something. If you can find data that’s a leading indicator of your shipments, sales, or demand, you’ve hit the jackpot!

The retail industry in the United States was severely impacted by the state-imposed restrictions that attempt to meet CDC guidelines. However, these restrictions were imposed in different areas, at different times, were adhered to with different levels of compliance, and covered different behaviors and activities. It’s not an easy task to capture the when, where, or how much of these restrictions for forecasting (or other) purposes.

One proxy could be the number of reported COVID-19 cases or deaths over a given time span in a certain region. But this data likely won’t capture the full complexity of the impact of COVID-19. The mobility data provided by Apple and Google or Safe Graph’s retail footfall traffic might provide a better representation of the impact of local restrictions. If you find patterns within the data suggesting that increased people mobility is correlated with shipments two weeks in the future, that puts you in a strong position to provide two-week-ahead forecasts. (Read this other blog post of ours to see other ways to improve forecasting in times of disruption.)

Conclusion

The importance of adaptability is paramount during COVID-19. We can react more quickly by using existing public resources and focusing our efforts on short-term efforts that are specific to our organization.

_________________________________________________________________

If you liked this blog post, check out more of our work, follow us on social media (Twitter, LinkedIn, and Facebook), or join us for our free monthly Academy webinars.

--

--