We need accurate Covid-19 testing data for a faster, safer reopening of the economy

Published in

Ataccama

6 min readMay 19, 2020

1. The most difficult decision politicians will now be making, repeatedly

Governments across the globe are struggling to balance restrictive measures to prevent the uncontrollable spread of the virus and the need to reopen the economy. These are extremely challenging decisions to make, and likely will need to be recalibrated on a continuous (rather than one-time) basis.

The epidemic is not over. Even in countries which have managed to successfully control the spread of the virus, it could quickly gain speed again. The situation will need to be monitored closely and measures reintroduced or tightened again if necessary. We need precise, real time data not only to monitor the impact of reopening, but also for longer term planning based on precise predictive models.

2. We can’t do it without data

There are several key metrics that will drive decisions around the speed of reopening — including clinical ones, such as number of patients in hospitals and ICUs, and the length of hospitalizations, particularly in comparison to the health system capacity in each state and/or country. Those metrics and underlying data are relatively easy to obtain, as most developed countries have reliable health systems in hospitals. Moreover, health data that needs to be collected about Covid-19 does not differ from that of any other diagnosis.

Covid-19 R0 rate for Germany. Note the orange and red peaks above 1, after which measures had to be tightened again. Source: Robert Koch Institute, @rki_de, https://r0.c8l.ca/en/

Aside from these factors, there are indicators which reflect the immunity of the population and whether the virus is spreading more quickly or slowly — the R0 coefficient. In order to calculate such numbers, other data needs to be collected and processed, often outside of established health systems and processes. The information we are talking about here is data about tests, and consolidated results for everyone who has been tested. It is vital to obtain and evaluate this data as quickly as possible, ideally in real-time.

3. Key data on testing is missing or completely unreliable

There are multiple reasons why gathering, processing, and evaluating the testing data is very unreliable under the current circumstances:

Covid-19 testing locations often don’t actually process the tests, but just collect swabs or blood samples. These testing locations are also the first point at which PII data is recorded. Some of them are drive-through or temporarily built tent stations, and therefore not set up for rigorous data collection.
The quality of the captured data is probably very low, given the circumstances. Health professionals wear full PPE while taking notes on paper or in combination with a computer device, patients are nervous, etc.
Collected samples are sent to a lab together with the data, where it’s re-entered into lab systems.
If a test is negative, no other tests are usually performed for the tested person. But sometimes that person undergoes additional tests for a variety of reasons. The data is once again captured and stored — often in another place.
For healthcare workers and certain other professions, regular testing is performed. Ideally, the subsequent tests are assigned to the initially collected PII, but this isn’t guaranteed.
If a patient tests positive, he or she will subsequently be tested multiple times, often in various other labs or institutions. The PII information and new testing results are recorded in many additional systems.
Due to the urgency of the situation, many labs with multiple different systems are being used to process tests.

The Covid-19 testing process explained. Source: https://www.labelmaster.com/covid-19-test-kit-shipping-supplies

These are just a few of the many complications of recording PII and test result data in the current situation. It is safe to assume that the quality of the PII data around Covid-19 is much lower than PII data in standard healthcare or other systems.

Aside from typos and incorrect dates, addresses, and ID numbers, Covid-19 testing data contains a high percentage of duplicates, anywhere from 10%–50%.

Even more problematic is the process of gathering the data from labs and integrating it quickly and centrally, as this data resides in tens or even hundreds of labs, systems, and data formats.

4. Only a few governments have the data they need

In many countries, statistics shared with the public and used for decision making are not based on detailed data. Instead, they are based on “summary reports,” or this many tests performed, resulting in this number of negatives/positives, calculated by labs or health authorities on a daily basis. Therefore, looking at statistics from most countries globally (with some exceptions), we see only numbers of samples are reported, not numbers of individuals tested. This is a major problem, especially when one of the key metrics for decisions about reopening the economy is new positive cases vs the number of individuals tested. Because of duplicates alone, this indicator may be off by 10%–50% at any given time.
The data also needs to be available fast, with less than 24 hour delay, and on a detailed level — potential second wave might start with local outbreaks.

Statistics on Covid-19 testing by country. Only 5 out of “top” 25 countries report number of cases (still unclear whether it means number of individuals tested). Source https://en.wikipedia.org/wiki/COVID-19_testing#Testing_accuracy

5. It is possible, with advanced data management and governance

Getting the collection and processing of testing and related PII data into reasonable shape will take years if done the traditional way — integrating labs systems, hospital systems, and central health data processing institutions.

But there’s a better way: using state of the art, automated data management and integration technology, such as Ataccama ONE. It is used by hundreds of global organizations in healthcare and other industries to integrate, cleanse, and deduplicate data in heterogeneous data landscapes. It can be deployed in matter of weeks, allowing governments to set up an environment with the following functions:

A secure application (portal) for labs to provide their testing data, in batches, real time, or daily
AI-driven profiling and validation functionality for operators to quickly identify data issues
Automated and manual data cleansing and data quality monitoring
Matching and merging of duplicate PII information
A master data management hub to store, manage, and provide clean and consistent data for each person tested, including every swab collection test and all test result information
Proper access controls and data protection features to protect this very sensitive data.

Covid-19 testing chart. Canada is one of very few countries able to collect, process, and report the **number of individuals tested** as opposed to number of test samples taken. Source: https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection.html

Using modern, secure master data management and data quality technology for Covid-19 testing data, health authorities will be able to monitor and model the pandemic based on real, high quality, granular data, not on estimates. This will allow governments to make the right decisions, balancing restrictions needed to save lives while reopening the economy as fast as possible. Without this sort of data, we will see authorities reacting in random, often extreme ways which might be damaging to the economy and still not optimal for the health of the people.