Published in


We need accurate Covid-19 testing data for a faster, safer reopening of the economy

1. The most difficult decision politicians will now be making, repeatedly

Governments across the globe are struggling to balance restrictive measures to prevent the uncontrollable spread of the virus and the need to reopen the economy. These are extremely challenging decisions to make, and likely will need to be recalibrated on a continuous (rather than one-time) basis.

The epidemic is not over. Even in countries which have managed to successfully control the spread of the virus, it could quickly gain speed again. The situation will need to be monitored closely and measures reintroduced or tightened again if necessary. We need precise, real time data not only to monitor the impact of reopening, but also for longer term planning based on precise predictive models.

2. We can’t do it without data

There are several key metrics that will drive decisions around the speed of reopening — including clinical ones, such as number of patients in hospitals and ICUs, and the length of hospitalizations, particularly in comparison to the health system capacity in each state and/or country. Those metrics and underlying data are relatively easy to obtain, as most developed countries have reliable health systems in hospitals. Moreover, health data that needs to be collected about Covid-19 does not differ from that of any other diagnosis.

Covid-19 R0 rate for Germany. Note the orange and red peaks above 1, after which measures had to be tightened again. Source: Robert Koch Institute, @rki_de,

Aside from these factors, there are indicators which reflect the immunity of the population and whether the virus is spreading more quickly or slowly — the R0 coefficient. In order to calculate such numbers, other data needs to be collected and processed, often outside of established health systems and processes. The information we are talking about here is data about tests, and consolidated results for everyone who has been tested. It is vital to obtain and evaluate this data as quickly as possible, ideally in real-time.

3. Key data on testing is missing or completely unreliable

There are multiple reasons why gathering, processing, and evaluating the testing data is very unreliable under the current circumstances:

  • Covid-19 testing locations often don’t actually process the tests, but just collect swabs or blood samples. These testing locations are also the first point at which PII data is recorded. Some of them are drive-through or temporarily built tent stations, and therefore not set up for rigorous data collection.
  • The quality of the captured data is probably very low, given the circumstances. Health professionals wear full PPE while taking notes on paper or in combination with a computer device, patients are nervous, etc.
  • Collected samples are sent to a lab together with the data, where it’s re-entered into lab systems.
  • If a test is negative, no other tests are usually performed for the tested person. But sometimes that person undergoes additional tests for a variety of reasons. The data is once again captured and stored — often in another place.
  • For healthcare workers and certain other professions, regular testing is performed. Ideally, the subsequent tests are assigned to the initially collected PII, but this isn’t guaranteed.
  • If a patient tests positive, he or she will subsequently be tested multiple times, often in various other labs or institutions. The PII information and new testing results are recorded in many additional systems.
  • Due to the urgency of the situation, many labs with multiple different systems are being used to process tests.
The Covid-19 testing process explained. Source:

These are just a few of the many complications of recording PII and test result data in the current situation. It is safe to assume that the quality of the PII data around Covid-19 is much lower than PII data in standard healthcare or other systems.

Aside from typos and incorrect dates, addresses, and ID numbers, Covid-19 testing data contains a high percentage of duplicates, anywhere from 10%–50%.

Even more problematic is the process of gathering the data from labs and integrating it quickly and centrally, as this data resides in tens or even hundreds of labs, systems, and data formats.

4. Only a few governments have the data they need

In many countries, statistics shared with the public and used for decision making are not based on detailed data. Instead, they are based on “summary reports,” or this many tests performed, resulting in this number of negatives/positives, calculated by labs or health authorities on a daily basis. Therefore, looking at statistics from most countries globally (with some exceptions), we see only numbers of samples are reported, not numbers of individuals tested. This is a major problem, especially when one of the key metrics for decisions about reopening the economy is new positive cases vs the number of individuals tested. Because of duplicates alone, this indicator may be off by 10%50% at any given time.
The data also needs to be available fast, with less than 24 hour delay, and on a detailed level — potential second wave might start with local outbreaks.

Statistics on Covid-19 testing by country. Only 5 out of “top” 25 countries report number of cases (still unclear whether it means number of individuals tested). Source

5. It is possible, with advanced data management and governance

Getting the collection and processing of testing and related PII data into reasonable shape will take years if done the traditional way — integrating labs systems, hospital systems, and central health data processing institutions.

But there’s a better way: using state of the art, automated data management and integration technology, such as Ataccama ONE. It is used by hundreds of global organizations in healthcare and other industries to integrate, cleanse, and deduplicate data in heterogeneous data landscapes. It can be deployed in matter of weeks, allowing governments to set up an environment with the following functions:

  • A secure application (portal) for labs to provide their testing data, in batches, real time, or daily
  • AI-driven profiling and validation functionality for operators to quickly identify data issues
  • Automated and manual data cleansing and data quality monitoring
  • Matching and merging of duplicate PII information
  • A master data management hub to store, manage, and provide clean and consistent data for each person tested, including every swab collection test and all test result information
  • Proper access controls and data protection features to protect this very sensitive data.
Covid-19 testing chart. Canada is one of very few countries able to collect, process, and report the number of individuals tested as opposed to number of test samples taken. Source:

Using modern, secure master data management and data quality technology for Covid-19 testing data, health authorities will be able to monitor and model the pandemic based on real, high quality, granular data, not on estimates. This will allow governments to make the right decisions, balancing restrictions needed to save lives while reopening the economy as fast as possible. Without this sort of data, we will see authorities reacting in random, often extreme ways which might be damaging to the economy and still not optimal for the health of the people.

A blog about AI in data management & data governance—to improve time to market and reduce manual effort

Recommended from Medium

Bullet Shaped Smear

Is It Safe? Let’s talk about vaccines

Beating the Delta Variant (Or Losing The World to It)

What Does COVID-19 Mean For Mosquitoes? — Vectorcide International

Everything You Need to Know About Herd Immunity

What They Don’t Tell You About The Coronavirus — Why I’m Still Affected Three Months Later

Warding off the Virus

COVID-19 dangerous speech breeds violence and helps the disease spread

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Michal Klaus

Michal Klaus

Performance enthusiast - both in my hobbies, business and technology. CEO of Ataccama Corp.

More from Medium

Major challenges faced while building a Data lake for enterprise organizations

Data Mesh — not such a new concept after all?

Migrating Our Events Warehouse from Athena to Snowflake

Guide to Data Governance Roles and Responsibilities