Reopening the Covid-19 World: A Data-Driven Government Response is Required
Podcast & transcript
Governments around the globe rely on data to make smart decisions around reopening the economy quickly and safely — far from easy decisions to make. What’s more, these decisions require constant reevaluation and re-calibration. In many cases, unfortunately, the data being used is unreliable.
Tune in to the podcast or browse the transcript below to learn more about:
- What can be done to better understand and react to the spread of the Coronavirus
- The quality of Covid-19 testing data, and whether governments have sufficient data to manage the reopening of the economy
- The importance of precise, real-time data to monitor short and long-term impacts on reopening (predictive models)
- How the availability and quality of Covid-19 testing data can benefit from advanced, automated data management & governance
Apple Podcast: https://apple.co/2UyOyPb
Ron Powell is an independent analyst, consultant and editorial expert. He’s also Executive Producer of Fast Forward on The World Transformed. With extensive knowledge and experience in business intelligence, big data, analytics and data warehousing, he maintains an expert channel and blog on the BeyeNETWORK, which he founded in 2004.
Welcome to Fast Forward on the World Transformed. This program presents conversations with thought leaders who are shaping our future through new ideas and new technologies.
In this edition of Fast Forward, we will be discussing what is required by governments around the world to successfully reopen their economies after the devastating impacts of Covid-19. As we all know, this needs to be done quickly and safely. Our future depends on it.
I’m Ron Powell and I am pleased to introduce our very special guest for today’s program: Michal Klaus. Michal Klaus is the CEO at Ataccama. Ataccama transforms businesses worldwide by providing them with self-driving data management and governance solutions. Previously, he was an international consultant with expertise in enterprise architecture, data management, agile methodologies, ETL, databases, and IT strategy.
Michal, welcome to Fast Forward on the World Transformed.
Thanks, Ron, it’s a pleasure to be here!
The Covid-19 crisis has altered the entire world, and governments around the globe are anxious to reopen their economies while avoiding a second wave of the deadly Coronavirus. What is your perspective on this situation?
That’s a great question Ron. Even though in the light of current protests following the death of George Floyd, everything else is somehow less important.
But let’s try. Let’s talk about the pandemic and how reliable and high-quality data can help fight it. After a very scary initial period, the situation is definitely beginning to improve in lots of countries.
At the same time, we still have many places where major restrictions are still in place, which makes life difficult and has damaged the economy. Just look at the unemployment rates. Every government is under pressure to reopen the economy as fast as possible. And at the same time, people do need to feel safe. Reopening in itself will not bring people back to shops or restaurants. People do need to feel safe.
In factories or many industries, especially those which produce physical goods, working from home is not an option. Not everyone is Elon Musk, who opened the Tesla factory in Fremont, despite objections from local officials. Governments need to balance lifting measures with containing the spread of the virus. And as I said, they need to do it in a way that makes people feel safe.
These decisions are extremely difficult, extremely challenging, because you’re balancing, on one hand, the health and even lives of people, and on the other hand, the economy. And even though the weather is getting better and people are tired of staying home and social distancing, the Coronavirus hasn’t gone anywhere. It’s still around. And actually, it’s even more present than two months ago when almost everyone was panicking. Any steps taken towards reopening will need to be monitored very closely so that they can be quickly tightened again if necessary, should the virus spread increase.
We’re definitely concerned about reopening the economy quickly. The longer we wait, the tougher it’s going to be in countries that need to get back to where they need to be. Aside from a vaccine for the Coronavirus (which may take months or years to develop), what can we do now to be more proactive when it comes to containing the spread of the Coronavirus?
Neither of us is an epidemiologist or scientist working on vaccines, so I can’t comment on that. But we are in the data business, and it’s been clear from day one that managing this pandemic is a lot about data. What is needed is high-quality details and ideally real-time data to monitor the impact of reopening the economy.
Also, in the longer term, we need high-quality data to feed predictive data models.
If we look at when this pandemic first broke out, many countries were looking at the data from different perspectives, and they were all making different decisions. The response was disjointed. Some countries had a very effective response with good results, whereas other countries had a very difficult time. Do you feel that governments have sufficient data to manage the reopening of the economy? And if not, what data is missing? And why do you think it is missing?
Well, some governments do have sufficient data, but it’s really a minority of governments. Most governments, including in the U.S. and other major economies, simply do not have sufficient data. And if I was to describe what’s missing, we’d need to dive a little bit deeper into what kind of data is needed.
There are several key metrics that will drive decisions around the speed of reopening — including clinical ones, such as the number of patients in hospitals and ICUs, and the length of hospitalizations, particularly in comparison to the health system capacity in each state and/or country. Luckily, these metrics and the underlying data are relatively easy to obtain, as most developed countries have reliable health systems in hospitals. Moreover, health data that needs to be collected about Covid-19 does not differ from that of any other diagnosis.
However, aside from these factors, there are indicators which reflect the immunity of the population and whether the virus is spreading more quickly or slowly — known as the R0 coefficient. In order to calculate such numbers, other data needs to be collected and processed, often outside of established health systems and processes. The information we are talking about here is data about tests, and consolidated results for everyone who has been tested. It is vital to obtain and evaluate this data as quickly as possible, ideally in real-time.
So, we hear a lot about the importance of testing. Now, with more testing, we can get ahead of the spread of the virus. Do you think that the current Covid-19 testing data is sufficient, and is it reliable?
In short, it isn’t. By the way, I just read a very good article about the poor quality of Covid-19 testing data. It was published by researchers at the John Hopkins School of Medicine. They have been publishing very useful visualizations and stats about the spread of the Covid-19 since the first days when the virus appeared in China. But no, we don’t have sufficient or reliable data on testing. And I’ll explain why.
First, it’s about the data capture: There are multiple reasons why gathering, processing, and evaluating testing data is very unreliable. Covid-19 testing locations often do not actually process the tests on site, but just collect swabs or blood samples. These testing locations are also the first point at which PII (personally identifiable information) data is recorded. The quality of the captured data varies, and can be very low, given the circumstances. Imagine being a health professional wearing full PPE while taking notes on paper or in combination with a computer device, in front of nervous patients, etc.
Second it’s about the data collection: collected samples are sent to a lab together with the data, where it is re-entered into lab systems. If a test is negative, no other tests are usually performed for the tested person. But sometimes that person undergoes additional tests for a variety of reasons. The data is once again captured and stored — often in another place.
And here is where it gets complicated. Some individuals are tested multiple times, and we have no reliable way of knowing if data from multiple tests belongs to the same individual.
For healthcare workers and certain other professions, regular testing is performed. Ideally, the subsequent tests are assigned to the initially collected PII, but this is not guaranteed. If a patient tests positive, he or she will subsequently be tested multiple times, often in various other labs or institutions. The PII information and new testing results are recorded in many additional systems. Due to the urgency of the situation, many labs with multiple different systems are being used to process tests.
And these are just a few of the many complications of recording PII and test result data in the current situation. It is safe to assume that the quality of the PII data around Covid-19 is much lower than PII data in standard healthcare or other systems. Aside from typos and incorrect dates, addresses, and ID numbers, Covid-19 testing data contains a high percentage of duplicates, anywhere from 10–50%.
Even more problematic is the process of gathering the data from labs and integrating it quickly and centrally, as this data resides in tens or even hundreds of labs, systems, and data formats.
When you talk about quality of the data and percentages like 10% inaccuracy or even 50% percent, that is a tremendous data quality issue. Without quality data, it’s hard to make quality decisions.
Yes, absolutely. And as we mentioned previously, when we were in a kind of state of war with the Coronavirus, no one cared about the details. But now, months later, it would be great to have real time, detailed data, which you can slice and dice. You ideally want to be able to look at the statistics per region, or stats based on demographic data. Everything. Any and all kinds of detailed analytics would be helpful.
But this is looking back. Looking forward, you’d want the same, in order to feed predictive models. And there it’s not enough to have the rough numbers, right? It’s not enough to know we tested 20,000 samples. You need to know how many people you tested, how many were negative, how many were positive, which locations, where they live, where they work, etc.
We don’t have this data, and without it, the decisions that governments now need to make around reopening while keeping people safe are being made as if blindfolded. This is especially true at the moment, where the situation is improving in many countries. Reopening is happening, and there is a high risk of new, local, outbreaks that start small. Just tracking the number of tests performed won’t help. If you see the number of positive cases increase by 50 in a day, you don’t know whether it was spread over 50 locations or whether it was in one location. This essentially guarantees a new outbreak will happen. The detailed data is crucial to prevent it.
So you need this granular data because it really comes down to the number of people that are infected versus the number of tests performed, because the same person could be tested 10 times, 20 times, 30 times. And that just confuses the numbers.
Absolutely. When I first started to look at the statistics, I happened to be in Sydney in February. Because of the proximity to the original outbreak, I started to watch the numbers. After a couple of weeks, it struck me that it’s a massive data management problem.
So, what can be done to improve both the availability and reliability and quality of the data needed?
In general, we have the tools. The tools are there. We have the concepts. We have the best practices. Together, these are called data quality, master data management, and data governance. There is a new generation of technology which does all of this, and does so with less effort, and faster. We at Ataccama have a platform that does it, called Ataccama ONE. In fact, we actually are offering it to governments for free to help in the fight against Covid-19.
We can’t help with the collection of the data. You still need to have the health professional collecting it on a computer or on paper. But from that point on, there are technologies that can help. Ataccama ONE is one such technology which lets this data be quickly submitted to a central processing location. It’s a fast, easy-to-use, secure way of submitting this data without any need for programming or technical work. Once the data is submitted, the platform profiles the data and uncovers any data quality issues right away.
On top of that, wherever possible, it will fix the data quality problems. Issues that cannot be fixed automatically or with machine learning can be sent to (what in the data space is known as) data stewards, or people who can correct typos and help identify other similar problems in the data. Once you have your input data in reasonably good shape, at a high quality, there is another step called deduplication, or mastering.
Say that you have 10 or 20 samples or tests taken. They each have associated PII information. The PII information is not exactly precise, not exactly correct. The same sophisticated technology will figure out that those 10 or 20 tests actually belong to one person.
All of this information, in a high quality, easy-to-use format is then stored in what’s called a data management hub, which can be provided to authorized consumers. And there’s one other very important thing which we can do when dealing with pandemics such as Covid-19. We can anonymize the data. It’s then possible to make it available to any researcher or other qualified person. This is how you can get many statistics, even predictive models.
All of this can be done quite easily. The technology is there. The know-how exists. And if governments do apply this technology and know-how, they’ll take off the blindfold. They will have very precise, reliable, real time data, which will help with monitoring the reopening of the economy on a very local basis, prevent any outbreaks, and allow people to feel safe again.
Well, that is terrific. We’ve talked a lot about AI and The World Transformed in several Fast Forwards that we’ve done recently, and the ability to use artificial intelligence and machine learning to help drive the quality of data, as well as the overall governance of information. You know, I believe this is definitely a much more coordinated way for governments to move forward with reopening their economies and make sure that the issues surrounding Covid-19 are minimized. Michal, I want to thank you for being with us today.
Thank you, Ron. If I may add one last thing. I used the word coordinated. Actually, it would be ideal if this was coordinated globally. And it’s not only applicable to Covid-19. Many scientists are saying that sooner or later there will come another virus like this, but even more deadly. So in a way, we’ve been given a chance and opportunity to use this as a preparation exercise for what’s coming. And because we do have this kind of technology, this kind of approach available, we can actually stop the next virus from becoming a pandemic.
So thank you, Ron, for the great questions, and let’s hope the world will transform for the better.
I hope any government out there that is looking for a way to handle their data and to perform better in these efforts will take you up on your offer of being able to use the software at no charge to move them forward.
That is going to do it for this edition of Fast Forward on the World Transformed. We hope you will join us again as we continue to explore a future that is unfolding before us in unexpected ways and at a breathtaking pace. To learn more about Ataccama, go to www.ataccama.com. To learn more about this program, visit www.worldtransformed.com. Thanks for listening.