It’s Time to Get Real About COVID Apps

Beating the Novel Coronavirus Will Require Smarter Contact Tracing AND Smarter Thinking about Privacy

Fighting Covid with Data
41 min readMay 14, 2020

American policymakers are struggling to find a way to keep the novel coronavirus in check while allowing Americans to return to work, see their families and friends, attend church services, travel, and leave the house. During this period of tumult, the U.S. and most other countries are looking enviously at South Korea. It is the only country that was once on track for significant spread that has managed to control the outbreak without lockdowns.

By any public health measure, South Korea has done a far better job of protecting the health of its residents than the U.S. has. To use one rough estimate, America’s per capita COVID-19 death rate has been forty-two times greater than South Korea’s. At one time the differences between the U.S. and Korean response could have been chalked up to differences in testing, but as testing becomes more widely available here, it is increasingly obvious that one of the greatest impediments to duplicating South Korea’s success is data: good data about who may have become infected and is therefore at heightened risk of asymptomatic spread.

South Korea avoids lockdowns by making data-driven decisions about whom to test, trace, isolate and quarantine. Automated risk assessments, contact tracing, and notifications can allow public health officials to quickly update risk assessments, notify high-risk individuals, get them to self-quarantine before infecting others, get them tested as quickly as possible, and, if they test positive, isolate them, decontaminate surfaces likely to be infected, and repeat the process based on their contacts. Tomas Pueyo produced a comprehensive summary of the benefits of an automated contact tracing system during the post-lockdown “Dance” phase of the pandemic’s management.

Yet, despite the prominence of high tech in the U.S., most public commentary argues that COVID apps will not be helpful and will not sufficiently protect privacy. Americans, too, seem to be unsure whether smartphone location tracking will add much value to the U.S. response to COVID-19. Thus, instead of leading with proposals for the sort of data tracking that could help state and local public health authorities manage COVID-19 without reinstating lockdowns, lawmakers are doing the exact opposite: proposing legislation that will make contact tracing even harder. Strong restrictions on the collection and use of personal data will all but ensure the U.S. will waffle between public health crises and draconian lockdowns for as long as the novel coronavirus remains a threat.

Strong restrictions on the collection and use of personal data will all but ensure the U.S. will waffle between public health crises and draconian lockdowns for as long as the novel coronavirus remains a threat.

This is a mistake. Predictions about the lack of efficacy and the lack of privacy are based on tentative, haphazard efforts that states and private companies like Apple and Google have begun to develop in a vacuum of federal support. Digital technologies can and should be an integral tool to magnify the benefits of ramped-up testing and manual contact tracing. Moreover, policymakers should start preparing for the rollout of a data-driven COVID response now. As was the case with lockdowns, delays will add to both the health and economic tolls.

This essay makes the case for a data repository run by the Centers for Disease Control that automatically (or, at least, by default) collects proximity and location data for the sole purpose of public health risk assessments and contact tracing. After showing why a system with maximum participation is necessary, we lay the legal and technical foundation for a smart Test & Trace program with meaningful privacy restrictions to reduce the risk of misuse. Because broad surveillance technologies stand to save tens, if not hundreds, of thousands of American lives, policymakers and privacy advocates should ask first what extant technologies can do to aid the response to the public health crisis, and then determine how to capture most of that value while protecting the privacy of all Americans.

I. All the Ways to Fail

To have a productive conversation about tech-supported risk scoring and contact tracing, we need to acknowledge a few facts about this particular virus. There are many good resources on the basics of viral epidemiology, and at this point most engaged people are familiar with the concept of R₀ (a product of transmission probability, contact rate, and the duration of infectiousness). With no changes in behavior, COVID-19 seems to have an R₀ of close to 3, meaning that each infected person infects three other people on average in a population that is initially entirely susceptible. The virus’s effective transmission rate (Rt) can be driven down by social and behavioral changes, therapeutic remedies that reduce viral shedding, ​or by the reduction in the portion of the population that remains susceptible as the epidemic unfolds — “susceptibles” being the unburnt “fuel” that ultimately drives all epidemics forward. There is simply no escaping that this virus is highly infectious and will, without a vaccine, likely infect the vast majority of populations in all countries throughout the world. Even now, as we are reaping the benefits of weeks of lock-downs and self-moderation, many states that are itching to reopen, or that already have (like Georgia), have an R over 1.0 — meaning that growth there continues to be exponential.

Let’s consider the ways that the virus can spread. Suppose we track what happens to every person who is currently infected with COVID-19 and has just become contagious at a specific time, T₀. Imagine we could observe all the people that come into meaningful contact with these COVID-19+ individuals during the period of likely viral shedding. Many of these people will likely become infected and contagious themselves. This is the group of “ALL CONTACTS” represented below:

The size of the group is reduced through physical distancing and mask-wearing measures. But as lockdown restrictions are lifted, we know that this group of “all contacts” will increase for each infected person, so controlling the spread of the virus will require us to find, test, and isolate that person’s contacts before they transmit the disease to others.

As the diagram shows, there are many possible points of failure in the process. The original case (the index case) might never be identified as a high risk, making testing and contact tracing impossible in the absence of routine, significantly more accurate testing. Or the index case may be quarantined and tested so late that any contacts who caught the virus have already exposed others. Assuming an index case is identified as a high risk of having contracted COVID-19, public health authorities may still have difficulty reconstructing events and identifying who else is at risk based on their exposure to the index patient. Moreover, even with perfect memory of personal contacts, some transmissions will occur to “contacts” who were not actually near the infected person, and instead made contact with a common surface or shared a larger enclosed space with circulated air. Those at significant risk who are never identified, or who are identified but not notified in time, will continue to spread the virus if they caught it.

The odds we face are daunting.

Assuming, optimistically, that public health authorities were able to detect and isolate 50% of likely new COVID-19 infections immediately, we could still have exponential growth unless people fastidiously comply with mask and physical distancing guidelines. Indeed, epidemiological modeling suggests that, in order to suppress the exponential spread of the virus, 60% of the people who came into contact with a COVID+ case would need to immediately begin a self-quarantine. If there is even a one day delay from the time of infection, that figure grows to 70%. At two days’ delay, it’s 80%. That puts tremendous pressure on a Test & Trace program. Without timely information and action, Rₜ will remain unacceptably high, and the virus will remain uncontrollable without new lockdowns.

Here are two useful graphical representations. The first shows the importance of minimizing delay and the second, the relationship between the effective Rₜ and the overall effectiveness of contact tracing:

Quantifying intervention success. Heat map plot showing the exponential growth rate of the epidemic r as a function of the success rate of instant isolation of symptomatic cases (x axis) and the success rate of instant contact-tracing (y axis). Positive values of r (red) imply a growing epidemic; negative values of r (green) imply a declining epidemic, with greater negative values implying faster decline. The solid black line shows r=0, i.e., the threshold for epidemic control. The dashed lines show uncertainty in the threshold due to uncertainty in R₀ (see figs. S15 to S17). The different panels show variation in the delay associated with the intervention — from initiating symptoms to case isolation and quarantine of contacts

Source: Science Magazine, “Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing” (figure 3)

Source: Tomas Pueyo, Coronavirus: How to Do Testing and Contact Tracing (Chart 23)

Other models are less pessimistic because they use different assumptions about new behavioral patterns that will emerge in light of what we now know about the virus. But until we know more about how the Rₜ changes in light of eased restrictions, a post-lockdown system should aim for getting 80–90% of the contacts of index cases to quarantine quickly.

II. The U.S. Is Still Failing

Although the case and death rate has plateaued in the U.S., we have not beaten back the virus the way that South Korea has. And yet, even with its aggressive contact tracing program, South Korea still struggles with mini-outbreaks (as has happened recently). Thus, the threat of a rebound in outbreaks here in the U.S. is very high since we do not yet have an adequate management system in place.

Our daily case rate requires us to have a plan for quickly quarantining the infected. We can do this one of three ways: (1) mass quarantines (in other words, more lockdowns); (2) mass deployment of accurate antigen testing so that every individual can take an antigen test nearly every day; or (3) wide scale use of data to target our testing supplies and quarantines to those most at risk of having contracted the disease.

So far, most of the American commentary on option (3) has been superficial. This is perplexing given the alternatives: option (1) is painful and option (2) might be impossible. It is worth at least considering how a data-driven distribution of supplies and targeted quarantines would work. To be successful, we would need:

  • Reliable measures of risk that an individual has contracted COVID-19;
  • Smart recommendations communicated to the individual based on that estimated risk (e.g., limiting movement, getting a COVID-19 test, or immediately self-quarantining);
  • Near instantaneous tracing of all people who may have come into close contact with infected or high risk individuals or surfaces; and
  • Swift updating of each individual’s risk scores and recommendations.

A noisy, error-riddled system will not be able to greatly improve our public health outcomes. Those whose risks are erroneously estimated to be low (false negatives) will spread the virus through the community, and those whose risks are erroneously estimated to be high (false positives) will lose faith in the efficacy of the risk scoring system and stop acting on its recommendations. The risks of both types of error fall unevenly on healthcare workers and financially vulnerable Americans who will feel compelled to work in high-contact environments; they will insist on a high level of accuracy in both directions, both minimizing the risk of false positives that keep them from working and false negatives that make their workplaces unsafe. So, much of this essay will be devoted to describing how to get the most sensitivity and specificity out of a system while respecting reasonable privacy concerns.

A. Human Contact Tracers Are Necessary but not Sufficient

The traditional, gumshoe style of contact tracing — interviewing the newly infected to reconstruct their activities since infection — is the necessary foundation. We support proposals like Sen. Elizabeth Warren’s to dramatically increase the staff available for these efforts.

Human contact tracing can reach the run-of-the-mill transmissions between friends and family members, and this accounts for more than half of known COVID cases. But human tracing will not be sufficient for finding the contacts of “superspreaders,” which will be necessary if we want to avoid intermittent spikes and lockdowns. Patients who test positive will inevitably have lapses in memory or reasons not to want to disclose their activities to human interviewers. More importantly, the patient will not be able to identify strangers who shared a space long enough for transmission during the period of viral shedding. And the time required for human interviewers to receive information from the patient and track down their contacts will add costly hours (if not days) to the notification process, during which their contacts could unknowingly put others at risk. In short, the virus will continue to spread, making containment difficult and resulting in preventable deaths.

The good news is that no one source of data needs to achieve the 70–90% target mark for finding and alerting contacts. A gap in human contact tracing can be mitigated, in many cases, by smartphone-based systems. The key is how effective such automated contact tracing needs to be as a supplement to, rather than a replacement of, our other resources. From the point of view of professional contact tracers, the sort of data that smartphones routinely create would greatly enhance the quality and the speed of their work.

B. The Apple/Google Proposal Is Helpful but not Sufficient

Apple and Google — whose two operating systems are used on nearly all of the world’s smartphones — are collaborating on a method to facilitate contact tracing between and among smartphone users through Bluetooth Low Energy signals. Each phone can detect and locally store identifiers (keys) for other phones that come within range. But the system can provide only a crude estimate of the likelihood of coming within the 6’ range in which transmission is believed to be most likely. Once stored, the data can be accessed by an app selected or developed by the user’s local public health authority. If someone tests positive for COVID-19, they can use the app to share their phone’s collected keys with a central database. As the app runs on other people’s phones, it will check that central database once a day and display an alert if a key from the database matches one of the keys that their phone had previously detected and stored. All data except the positive test keys are kept on individuals’ phones.

Apple and Google have promised to reserve their system for the exclusive use of public health authorities, and on the condition that individual users opt-in both to participating in the program and before any data is shared by the app from their phone. Specifically, users would need to toggle a permission in their settings and download their public health authority’s app:

It is laudable that Apple and Google are doing their part to support contact tracing, but their system has so many technical and human-factors limitations that it will not greatly improve the efficacy of contact tracing and targeted testing without a bolder government intervention. Those limitations include:

  • Insufficient Smartphone Deployment. Nearly 20% of American adults have no smartphone. Also, the Apple/Google system won’t work on many older smartphones. These limitations are particularly relevant for the most vulnerable communities: the low-income and elderly.
  • Insufficient Participation. Once the proximity detection function is added to smartphone operating systems, users will have to both (1) proactively enable the proximity detection function and (2) download their public health authority’s app in order to participate in a contact tracing program. If a user waits to activate the proximity detection or to download the app until they receive a positive COVID-19 test result, the system will not be able to look back in time to trace their contacts. Other countries that have relied on residents to opt into a contact tracing app have had very low participation rates: 20% in Singapore and 10% in Australia.
  • Poor Access to Testing Information. Even though current law already requires positive test results to be recorded in public health databases, users do not have to upload their keys to the central database. Some users simply won’t provide that information, while others will delay, costing valuable time.
  • Fragmented Implementation. The Apple/Google system has no issues with interoperability between their two operating systems; any public health authority’s database can draw data from both kinds of phones. But it doesn’t solve the problem of interoperability among health authorities, which could each have their own app and central database. Some states have already created their own contact tracing apps and there is no standard for data interoperability. Adoption rates will be low even if just one app is involved. They will be even lower if people need to download a different app every time they cross a state line. And having to track contacts across multiple apps and databases will cause additional delay — if it works at all.
  • Won’t Identify Places that Need Decontamination. The Apple/Google system identifies only contacts with people who may need to be quarantined or tested, not places that may need to be decontaminated as quickly as possible.
  • Waiting Too Long to Notify Users. The Apple/Google system will fail to identify risk in time. In addition to the problems caused by low participation, the system relies on either a positive test result or a symptoms-based positive diagnosis by a health authority as the sole signal to trigger a notification for that person’s contacts. An effective system would need to automatically estimate risk and push out recommendations to those who have been in contact with a high-risk individual even before the high-risk individual has been tested.
  • Too Many Notifications. By design, the Apple/Google system will collect only proximity data reflecting the interaction of individuals’ phones. The system will not collect geolocation data, and it will not release time-stamped proximity data to users or to public health officials to supplement traditional contact tracing. As a stand-alone alert system, the Apple/Google technology will perform poorly because it will not allow users or public health authorities to use other information such as the exact time and place of an interaction to help differentiate between high-risk and low-risk events.

Consider this explanatory illustration from the Apple/Google FAQ:

If you received such an alert, knowing that it could have been generated by the owner of some phone that came within some unspecified (and unspecifiable) distance of your own, knowing only which day that contact might have occurred, how would you respond? Public health authorities’ apps will be able to choose the conditions under which an alert is sent, but with such limited, siloed data and guaranteed error, the only choice available to public health authorities is to trade false positives for false negatives.

Several leading privacy advocates (including Bruce Schneier and others) have pointed out these shortcomings to argue that the privacy risks of automated contact tracing are not worth its meager returns. Based on the narrow criticism (that this system isn’t worth its costs), they have generalized to condemn all possible tracking systems. There is a certain irony in this: Google and Apple, companies long criticized for knowing “everythingabout us, now cannot make even a marginal contribution to managing our contact tracing.

The popular criticism of automated contact tracing has misdiagnosed the problem: The Apple/Google program first maximizes privacy, and then does what little it can for public health within those constraints.

We should reverse the frame and ask: How can we save tens (maybe hundreds) of thousands of American lives with the help of readily available technology? And then we should ask how to bake in privacy protections that minimize the risks of abuse and scope creep.

II. We Need to Try Smart Test & Trace

Americans live with a popular, if unstated, assumption that we cannot recreate the success of South Korea. But there is nothing magical about the data-driven management that has been used to contain the virus in that country. Moreover, it is hard to accept broad and superficial observations that South Koreans tolerate more government oppression when they remain free to go to work and spend time with their loved ones while most Americans languish in compulsory isolation and growing economic despair.

To be sure, some aspects of South Korean public health management do not need to be replicated: the detailed location history logs of COVID+ cases are published publicly (enabling reidentification of specific individuals), and quarantines are enforced by police visits to anybody who turns off their phone. These aspects of the South Korean response are some of the least important for suppressing the spread of COVID-19. We should be able to borrow the other, more valuable aspects of the South Korean system without duplicating the heavy-handed enforcement and broad public disclosures that tend to trouble Americans. South Korea broadcasts where every infected person has been (bad for privacy) and expects individuals to evaluate their own infection risks (an ineffective public health strategy). We will be better served by a system that is smart enough to allow algorithms and human contact tracers to work together to formulate tailored warnings and deliver them only to those Americans who may have been infected. This sort of technology-assisted narrowcasting of warnings can preserve the patient’s privacy by communicating only what is necessary (typically, when and where their own potential exposure event took place). This system would achieve most of the public health benefits made possible by automated contact tracing without many of the privacy costs associated with South Korea’s system.

Narrowcasting of warnings can preserve the patient’s privacy by communicating only what is necessary and achieve most of the public health benefits made possible by automated contact tracing without many of the privacy costs associated with South Korea’s system.

The important aspects of a smart, privacy-protective system are: (a) collecting multiple types and sources of data; (b) pooling them into a data repository; © analyzing data to make reliable, targeted risk scores and customized public health recommendations; and (d) maximizing participation in the program. Each of these aspects could be accomplished in a variety of ways. Time of the essence: we can build on the Apple/Google system already in development, so we use that as a launching pad.

A. Collecting Location and Proximity Data

South Korea uses multiple independent sources of information — geolocation, credit card data, CCTV, facial recognition, and old-fashioned interviews — in order to better trace contacts and predict the risk of transmission for each person. The U.S. response will also need to use multiple types of data in order to reach the 70–80% tracing threshold required to avoid exponential spread of the virus. However, we do not need to use all the same types of data that South Korea does. We can make use of the diversity of information that comes from human interviews, geolocation data, and the Bluetooth proximity detection system in development by Apple and Google.

Geolocation and proximity detection each have some advantages over the other, so that each can make significant contributions to reducing error (both false positives and false negatives) in a Test & Trace program. Both geolocation and proximity detection will have trouble knowing whether people were close enough to each other to have a credible risk of transmission, but fortunately, the error will be overlapping yet different. This is good, as each can help fill the gaps left by the other.

The South Korean system uses approximate geolocation data derived from cellular signals — information the wireless carriers already collect. We propose to use a more precise source of location data: the Global Positioning System (GPS), which can generally triangulate its location to within a range of about 15 feet if it has a clear signal from multiple GPS satellites. It works less well in cities with tall buildings, and it is largely ineffective for indoor positioning. Even with a good signal, it works roughly three times better horizontally than vertically. Thus, if two people are on different floors of the same building, a system that relies exclusively on GPS will be much less likely to know that the risk of transmission between the two people is negligible. But despite these limitations, GPS location data are generally accurate enough to figure out who was in (roughly) the same place as an infected person. And it also identifies which indoor spaces should be closed down and disinfected.

The Bluetooth system in development by Google and Apple can overcome some of these limitations. It works underground and inside buildings, since it requires only that devices can send Bluetooth signals to each other — not to satellites. And it is usually more sensitive to the physical closeness of two people, especially vertically, and thus would generally be less likely to make the mistake of thinking that two people separated by several floors are in contact. But because it was designed to work without collecting location data, the Apple/Google system cannot be used to identify potentially infected spaces for decontamination or to infer from other contextual information (e.g., location) whether two people were likely to have interacted with each other. And because timestamps will not be made available, it will miss valuable information that can help COVID+ individuals and human contact tracers better determine which recorded contacts are at meaningful risk.

An ideal program will have to use both GPS and proximity data. At a minimum, each proximity event would need to be supplemented with the geolocation data from both devices. If we leverage some of the advantages of the Bluetooth system and add geolocation to it, we could very likely estimate coronavirus risk as well as, if not better than, the South Korean system without using the other privacy-invasive sources of information used by their system, like CCTV footage and facial recognition.

B. Centralized Data Repositories

The Apple/Google system works in isolation from the interview-based contact tracing and other measures that public health authorities use. The system enables an app to check in with a database at least once a day to receive “Diagnosis keys” of anyone “confirmed to have a positive diagnosis of COVID-19.” The app then looks for matches between the diagnosis keys and the log stored on the phone of keys the device has come into contact with. If the app finds a match, the user will receive an alert, but the match will never be reported to public health authorities.

The Apple/Google system is radically decentralized: each user’s phone does all the work to identify potential matches between the keys of those who have tested positive and the keys of devices they have come near enough to collect. This decentralization makes a lot of sense for other functions, but not this one. We need a system that pools individual-level data from lots of sources in order to make swift-enough and accurate-enough evaluations of risk. This, in turn, will allow public health authorities, informed by location and proximity data, to better counsel the patients who test positive, better allocate scarce testing resources, and to more appropriately advise individuals about the need for quarantines. Centralized databases could be maintained at the state or territorial level, if necessary; and would still offer enormous advantages over the Apple/Google system. But as interstate travel resumes, it will be more valuable to have a national data repository (e.g., housed at the Centers for Disease Control) rather than a patchwork of databases siloed along state lines.

Regardless of whether there is a national Data Repository the actual risk models and apps used with the data will be in the purview of state, territory or local public health authorities. The data repository will allow public health authorities to trace contacts much more quickly and accurately, which will likely be the determining factor in whether the virus continues to spread exponentially. A recent report by Johns Hopkins described the assistance of even relatively crude digital tracing technologies as an effective force multiplier for human contact tracers.

C. Smart Notifications

An effective Test & Trace system must give credible guidance to individuals about their risk of infection and best course of action. It has the potential to reduce losses at all of the points of failure by giving public health authorities enough data to make accurate predictions of risk and to better target notifications to each individual.

A smart Test & Trace program will start, at t₀, with a much better chance of directing a COVID+ individual to get a timely COVID test. And it will also update, at t₀, the risk of all individuals who were in contact with that individual (or with a surface they may have contaminated) so that they can receive better information about their individual level of risk and better guidance about whether they should get a test or restrict their movement. If data from diverse sources are pooled into a single database and constantly analyzed with the best prediction algorithms, an app can provide rapid, high-quality notifications.

Public health authorities will ultimately decide how best to use the data in order to craft customized guidance for their residents. We can imagine a range of apps — some with color coding, some with more detailed information about the time, date, and location of potential exposure. Some may even allow an individual user to dial in their risk tolerance so that, for example, immunocompromised individuals or their families would be advised to restrict movement or avoid contact with vulnerable family members based on a lower threshold of risk than others.

A system with more data will be easier to understand and more informative for users than the Apple/Google system that processes only proximity data and test results, and does so only on each user’s device in isolation from other information.

In principle, a smart Test & Trace system should be able to work something like this:

Source: Science Science Magazine, “Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing” (figure 4)

D. Maximizing Participation

The South Korean system relies on compulsory participation. This is no purely cultural fact; near-universal participation is a practical necessity of a program that avoids mass lockdowns.

Recall that an effective program will have to identify and notify 70% of the contacts made by a person within a day of becoming infectious in order to slow the spread of the virus. To achieve that, nearly everybody has to be traceable. After all, each contact consists of two people: both the index case and the person who came into contact with her. If either one has not permitted data to be collected and is not traceable through interview-based reconstruction, the system will miss them. As a result, the effectiveness of the system as a whole is the square of the participation rate: If only 70% of the community participated, fewer than half of all human contacts would be recorded (.70 x .70 = .49). To reach the 70% overall success threshold, almost 84% of the population must participate.

For this reason, both the European Union models for contact tracing and the Apple/Google plan are inadequate. Both would require users to repeatedly opt into any public health-tracing program. Opt-in requirements virtually guarantee that these apps will have limited efficacy due to low levels of participation.

We recognize that the stark reality of effective COVID-19 risk-monitoring runs headlong into distrust of Big Tech, and the trend in privacy policy over the last decade to increase and reinforce user choice. But in the context of the pandemic, where surveillance has the potential to release us from the double bind of health crisis and lockdowns, a choice-centered approach to privacy is reckless.

In the context of the pandemic, a choice-centered approach to privacy is reckless.

III. Privacy that Lets Us Live

Journalists and academics have criticized the various “Liberate” protests that have been cropping up around the country and that seem to have had some influence on decisions about relaxing stay-at-home orders. The criticism is sound to the extent that society does, and should, prefer to curtail some liberties in order to save the lives (and, thus, the liberties) of others. Decisions that would normally be the prerogative of each individual have been compressed by COVID-19 into a knot of interpersonal conflicts of interest. The usual rules for working, traveling, worshiping, and going about one’s day have been upended by the unique and stressful circumstances of managing the virus.

Yet ordinary expectations of personal privacy have not received the same acid wash of scrutiny: polling suggests that most commentators, including technologists, are still unwilling to think through socially responsible tradeoffs among privacy, human life, other liberties, and economic costs. Even outside the social sphere of privacy advocates, most Americans take for granted that any COVID-19 response must comply with long-standing privacy laws even though some of our most fundamental rights have been suspended for weeks. For example, during one of the popular and highly respected “Grand Rounds” broadcasts produced by the University of California-San Francisco College of Medicine, Dr. George Rutherford claimed that the U.S. could not engage in South-Korean-style collection simply because it would “edge against certain privacy laws we have in this country.”

Making a Test & Trace system efficacious does not have to mean giving up on privacy, broadly conceived. It only requires letting go — in this extraordinary context and for now — of a particular conception of privacy that essentializes individual choice. Most of the privacy threats raised about contact tracing could be guarded against just as well (better, in fact) by strong and verifiable restrictions on how every person’s data is accessed, used, and deleted. In other words, we can protect privacy without obstructing a critical tool to manage the public health crisis by using mechanisms other than user choice.

A. User Choice and Externalities

American policy experts assume, and the European Commission has expressly required, that a digital tracking program would be unethical unless each individual has a choice, free from duress, about whether or not to participate. Under the particular qualities of COVID-19, that ethical premise is dubious.

Consider the following hypothetical: Esther and Edmond, both essential workers at a grocery store, must interact with strangers regularly. Both have downloaded their state’s COVID-19 tracking app. Peter can work from home and, because he has a pre-existing health condition, he is reluctant to leave his house unless he has confidence that the risk of viral spread is well-managed. Marc can also work from home, but feels relatively confident about his health. Marc has a strong personal taste for privacy, so he has not opted into using a phone-based contact tracing system.

Marc has contracted the virus but has no symptoms and is unaware he is infected. If he were part of an automated Test & Trace program like the one we have described, he may have been alerted that he was at high risk of having contracted the virus from another COVID+ individual and would have been advised to self-quarantine and/or get tested. But because he chose not to participate in a voluntary program, he had no knowledge of his risk and thus infected Esther during a prolonged interaction at the grocery store.

The next day, Marc developed a mild headache and body aches. His doctor recommended a COVID-19 test, which came back positive. Because Marc had not downloaded the COVID app, neither he nor the public health authorities have a record of his contact with Esther. Thus, he had no way of communicating his test status to Esther or others who came in close contact with him. (A general notification to the grocery store would not provide actionable information.) In any event, even if Marc had opted into the app before infecting Esther, he would have chosen not to report his keys when he learned his test results. Because Esther did not know about her heightened risk of exposure, she continued to work, thereby putting Edmond at risk. Peter, meanwhile, has decided to stay home and avoid contact with his family and friends because he rightly infers that people have no reliable way of evaluating their individual risk of exposure. Thus, Marc’s choice affects not only the health of Esther and Edmond, but the liberty of Peter, too.

This illustration helps show why it doesn’t make sense, given the particular characteristics of this virus, to treat each individual’s privacy choices as a matter for individual control. As with lockdowns, the decision must be made at a collective level. A user choice conception of privacy must give way to other societal interests. To quote one employee of the Electronic Privacy Information Center (EPIC), a preeminent privacy advocacy organization, responding to the risks of COVID-19, “privacy is something that’s constantly weighed against other things.”

“Privacy is something that’s constantly weighed against other things.” — employee of EPIC, a preeminent privacy advocacy organization

None of this is new to public policy. For decades, common law rules and statutory restrictions (e.g., HIPAA) have treated communicable disease as a special circumstance requiring the suspension of personal control of sensitive data. HIPAA generally requires users to consent to sharing of personal information, but not when a person is diagnosed with any of several dozen highly infectious (or less infectious but highly dangerous) diseases. To the contrary, such diagnoses are required by state law to be reported to central databases.

Even Europe’s new privacy law, the GDPR, allows for the collection and processing of data without consent in the context of a public health emergency. (For reasons that baffle us, the European Commission has opted to advise member states not to use this exception, insisting that users must opt-in to any automatic data-collection for COVID-19 management.)

B. Disproportionate Burden for Minority and Low-Income Americans

Public health data from places with COVID-19 outbreaks consistently show that minorities are both more likely to contract coronavirus, and more likely to have severe or fatal cases of the disease. For example, here is a breakdown of known COVID-19 cases in New York City from a few weeks ago broken down on a per capita basis among racial groups:

Data from other regions show that the case fatality rate for African-Americans has been 2.6 times higher than that for whites, and European countries have reported similar trends.

The higher case rate can be explained in part by the fact that members of racial minority groups are also economically disadvantaged, and thus more likely to work in low-paid essential services jobs where physical distancing is not possible.

Source: “British BAME Covid-19 death rate ‘more than twice that of whites,’” The Guardian

We note these disproportionate burdens because they show that user choice will cause externalities that are distributed unevenly and inequitably, shouldered by the most disadvantaged Americans.

C. Privacy Advantages of Automatic Collection

In some contexts, automatic collection of information imposes one kind of privacy cost while relieving another. Many people take advantage of the Google search bar to ask questions that they would not feel as comfortable talking about with another person (even a trusted fiduciary such as a doctor). And people are more willing to share sensitive data with doctors than their families. Privacy, as it’s actually practiced, is dependent on context.

In manual contact tracing of transmissible diseases, patients just given a potentially devastating diagnosis will bear additional psychic burdens from having to reconstruct their whereabouts for interviewers and, given limited interviewer manpower, may have to alert all interested parties themselves. Because so many people may be reluctant to have these conversations, or may delay having them, state law often affirmatively requires doctors and patients to disclose the risk of communicable disease to others whom they may have infected. New York, for example, requires doctors to report the relevant contacts of their patients who have tested positive for HIV to the public health authorities, and to notify the patient’s past partners about the risks if the patient does not do so herself.

Testing positive for COVID-19 does not have the stigma that HIV and other sexually transmitted diseases do, but there are many scenarios where a patient might not want to revisit their actions. For example, patients might not want to share details about counseling or love affairs with a person — even a stranger conducting an interview by phone. The contacts of the patient may also prefer automatic data-collection precisely because it is impersonal and does not require them to make difficult decisions. Those who may have been exposed to the virus might want to be able to access a test or justify a home quarantine without having to divulge the details of the potential route of transmission to another human being. Automated systems can make use of location and proximity data while also giving individuals the moral license to avoid talking about it.

Absent a system such as ours, many employers, schools, universities and other institutions will feel the need to create their own testing and contact tracing systems for entrants to their premises and employees. These systems will lack legal mandates to minimize data collection and to restrict access and repurposing. It is better to get more efficacy and more privacy through legislation governing a single, unified system.

D. Americans Want More and Better Use of Tracking Information

Although privacy advocates have expressed consternation about any data collection system that does not involve affirmative consent, most Americans seem to understand the tradeoffs and approve of automatic data collection. A study performed by University of Washington researchers has found that most Americans feel neutrally or positively about the prospect of sharing location data with the government for the purposes of COVID-19 management.

Source: COVID-19 Contact Tracing and Privacy: Studying Opinion and Preferences

According to a recent Pew report, black and Hispanic Americans, who typically have higher levels of distrust for the government and privacy-invasive technology, are notably more likely to say that it is acceptable for the government to track cellphones as part of the COVID-19 response.

This is probably explained by their heightened probability of having severe complications from the virus, reinforcing the idea that a socially responsible COVID-19 response will take decision-making out of the hands of individuals when their choices can cause undue harm to other people.

Black and Hispanic Americans are notably more likely to say that it is acceptable for the government to track cellphones as part of the COVID-19 response.

We suspect there would be even more support of a Test & Trace system if Americans were confident that the program would work well enough to allow the economy to reopen without causing a second wave of exponential growth in disease rates, and if the data would be protected from exploitation by the government, companies or malicious hackers.

E. Fourth Amendment Reasonableness

One remaining issue is whether the Fourth Amendment would prohibit the automated collection of location histories performed in coordination with public health authorities. Prof. Alan Rozenshtein has laid out the relevant cases and criteria to determine whether a data collection program would violate the Fourth Amendment, with appropriate attention to the “special needs doctrine.” That doctrine is specifically designed to allow random or mass surveillance that would ordinarily violate Fourth Amendment rights under circumstances where the social costs of non-detection are severe and alternative solutions aren’t available. For example, the special needs doctrine allows police to set up DUI checkpoints and stop every person who comes through.

The special needs doctrine incorporates an analysis of the importance of the social interest at risk, the seriousness of the privacy intrusion, and the efficacy of the surveillance program in order to assess whether the benefits of a surveillance program are worth the costs. In this way, the doctrine functions a lot like the typical strict scrutiny that has been applied when other constitutional rights are implicated by the coronavirus response (such as the First Amendment freedom of assembly and free exercise of religion). Commentators like Rozenshtein have suggested that a Test & Trace program might not survive scrutiny, particularly after the backlash and questionable payoff of programs like the telephonic metadata surveillance enabled by the USA Patriot Act.

We disagree. As long as there is reason to believe that a data-driven Test & Trace program could help control the spread of COVID-19 — and the experience of South Korea gives credence to such a belief, at least for now — the scale of the threat we face today will easily justify an order that compels Apple, Google, and other companies to share useful data under Fourth Amendment scrutiny. And this is particularly true if the order were paired with legal structures to safeguard the data from misappropriation, including use for criminal law enforcement.

The scale of the threat we face today will easily justify an order that compels Apple, Google, and other companies to share useful data under Fourth Amendment scrutiny.

Consider, for example, the other threats that have been balanced against Fourth Amendment rights:

  • Number of Americans who died in 9/11 attacks: 2,977
    (justified FISA Amendments Act and USA PATRIOT Act)
  • Number of Americans who die from gun violence each year (non-suicide): 14,200
    (justified gun registries and licensing)
  • Number of Americans who have died so far from COVID-19 despite lock-downs: 80,000+ (as of May 12, 2020)

Even without an automated Test & Trace system, the realities of COVID-19 will require human contact tracers to engage in many invasive uses of information. If a grocery store is informed that a customer who later tested positive had visited the store, CCTV footage will be used to identify staff and other customers who interacted with the infected customer. Credit card transaction records might be accessed in an attempt to identify other customers who were shopping at the same time. This process is time-consuming and error-prone compared to the fully-automated system we are suggesting, and they would not have any special statutory protection against misuse. Lawmakers should prepare now for a data collection system that will greatly improve health outcomes by incorporating legal protections and privacy-by-design features. We describe the pillars of a robust privacy regime that does not rely on user choice in the next part.

IV. Elements of a Responsible Test & Trace Program

Congress, the President, and state governments have an opportunity to create the regulatory backdrop for a Test & Trace program that might be effective enough to allow us to manage the virus without intermittent lockdowns. Here are the pieces we think are wise, if not indispensable:

A. Universal Access to Smart Devices

We believe a rapid effort to make sure every American has a smart device capable of doing contact tracing would be well worth the investment. 19% of Americans (49 million people) do not have a smartphone. On top of that, many “smart” devices are too old (pre-2012) to support contact tracing via Bluetooth Low Energy.

That means roughly 25–35% of American adults (64–90 million) will not be able to participate in Test & Trace without help. They are disproportionately elderly and economically disadvantaged. It is very important to include these groups in a Test & Trace program because the elderly are at heightened risk of complications from COVID-19, and the economically disadvantaged are more likely to work in jobs where physical distancing is impossible.

Congress can greatly expand participation in a Test & Trace program, and thus its effectiveness, by passing another stimulus/economic relief bill focused on the provision of a smartphone (and possibly also a smartwatch) as well as mobile data service to every U.S. resident for as long as data collection is deemed necessary — on the condition that each participant uses all public health authority apps for their regions. The first priority should be the provision of smart devices to any U.S. resident who currently does not have one and would like one. A less ambitious program could be created quickly by expanding the FCC’s longstanding Lifeline program.

Smartwatches (currently owned by only 1 in 6 American adults) offer several advantages over smartphones. Most relevant for contact tracing is that the signal strength data from these devices, worn outside clothing, will provide a better measure of proximity than smartphones since phones are often kept in purses and pockets (making signal strength an unreliable measure of proximity). Smartwatches are also cheaper and would require only a basic (cheaper) data plan. They are also more practical for young children — who will also need effective contact tracing before they can return to school, playgrounds, and socializing with their friends.

Ideally, the U.S. government should purchase a smartwatch with a distinctive color or style so that others can immediately see that the user is participating in the program. Much like face coverings and gloves, a distinctive smartwatch will allow people to immediately identify those who are, and are not, taking precautions to protect each other. As the smart watch becomes part of the common wardrobe of socially responsible Americans, social norms will help crowd-in the decision to participate.

Achieving anything near universal access to smart devices would require another large stimulus package (in the range of hundreds of billions of dollars), but it would be pulling double-duty by removing some of the financial pressures for Americans and providing the technological requirements for effective management of COVID-19. Any investment in increasing deployment of smart devices would relieve some of the financial burden of hiring the army of human contact tracers by making each tracer more efficient.

B. COVID19 Test & Trace System

The same federal legislation that funds the Universal Access to Smart Devices program should also include four other pieces: a central database, an oversight body to oversee the system, and two legal mandates to collect and upload data to that database.

Creation of a Data Repository. The Centers for Disease Control should be required to create and administer a data repository, and to maintain it in a way that state and local public health authorities can access and seamlessly work with to generate risk scores and communicate with individuals.

Creation of an Oversight Board. Congress should create an oversight board, equipped with experts in technology, privacy, public health, and economics as well as citizen representation, that supervises the administration of Test & Trace programs. Specifically, this Board will ensure compliance with the privacy standards described below, ensure the efficacy of the programs, and evaluate the impact on civil liberties. The Board will assess the continued need for the program based on its contributions to public health and freedom of movement and its costs and risks. The Board will produce reports related to the efficacy, costs, and burdens to civil liberties generated by the program.

Requirement to Collect and Upload Location and Proximity Data. In conjunction with privacy protections (described below), Congress should order Apple and Google to configure the operating systems on all supported smart devices so that they (a) automatically send and receive Proximity IDs and signal strength via Bluetooth, and record any Proximity IDs received along with a timestamp (as their current system does, though five-minute intervals may not be short enough); (b) automatically include the device’s location with recorded Proximity IDs; (c) automatically include the periodic location tracking data that Apple and Google already collect for advertising and other purposes, separate from, and in addition to, the location data included with logged Proximity IDs; and (d) automatically upload those data to the Data Repository at least daily.

Requirement to Flag Proximity Keys When Someone Tests Positive. The system should be designed so that when someone tests positive or is clinically diagnosed as having COVID-19, testing administrators can flag the patients’ proximity ID key in the central database as belonging to an infected individual.

With this data repository and system of data-collection in place, each state and local public health authority can choose or create a risk-assessment and notification app, and state governments, employers, or retailers have the option of creating mandates or requirements for individuals to download and use the app. An official nation-wide app, perhaps issued by the CDC, would be advisable, but not strictly necessary so long as other governmental apps can interoperate seamlessly through the shared national Data Repository.

C. Data Protection

Given the compulsory nature of data collection under the Test & Trace program, it is essential that a strong system of protections govern all data collection and use. Essential privacy protections include:

Transparency by Design. Every aspect of the system specifications (but not the user data) should be open to public scrutiny. All software should use open source code so its functioning can be understood and no features can be hidden; all access to the Data Repository should be logged and the access logs should be publicly accessible. The purpose of any non-routine access will also be logged. And all risk models used by public health authorities’ Test & Trace apps should be made public so that the epidemiological premises on which the system rests can be scrutinized and debated. These transparency requirements ensure that the public debate about the privacy and security protections for the program and the continuing need for legal mandates will be well-informed.

Data Storage. Data should be stored in a manner that makes the raw data difficult to decipher by an unauthorized user. Modern cryptography technologies (e.g., blockchain) allow sensitive databases to require multiple digital keys for access and to log all access for audit purposes.

Data Minimization. Data should be uploaded and stored in the COVID-19 Data Repository with randomly assigned unique proximity IDs that are generated every fifteen minutes. No names or common direct identifiers should be transmitted or stored. The only data uploaded by devices should be time-stamped geolocation and proximity data, and the data in the repository will be deleted after the program has ended or after a period of 60 days.

Data Security. The CDC and all entities that supply data to or from the Data Repository should use best practices for the security of highly sensitive data, and for the authenticated deposit and retrieval of data.

Use Limitation. The sole purposes of the collection and use of any identified or reasonably identifiable personal information are (a) to model and estimate the risk that a person is infected with COVID-19; (b) to notify a person about their estimated risk; (c) to request or order a person to immediately self-quarantine and/or complete an antigen test; (d) to distribute scarce medical resources including antigen and antibody tests; (e) to aid in the design of public health authorities’ studies of stratified or random samples of a population; and (f) to generate reasonably de-identified data for the purposes of statistical study (including quality control and improvement of the technology). No entity, including Apple and Google, may use the data collected or uploaded for the Test & Trace program for their private benefit.

Access Limitation. Data should only be accessed by Public Health Authorities and by PHA-authorized apps. Data should only be accessed for one of the specified uses described above, and raw data may not be redisclosed. Law enforcement use of Test & Trace data must be explicitly prohibited (with the possible exception of enforcing quarantine orders if heavy-handed enforcement appears in some jurisdictions to be necessary). To guard against unauthorized access, each use of the Repository should be logged and subject to audit by the Oversight Board, and the access logs released to the public.

Accountability and Oversight. In addition to the Transparency by Design principles, the Oversight Board, described in Section B above, would continually reevaluate the privacy-related risks and continued need of the program.

Enforcement. Penalties for violations of the law should be at least as severe as the Electronic Communications Privacy Act (ECPA): in general, up to five years in prison, fines up to $250,000, and private rights of action for intentional violations by actors.

Qualified Immunity. Apple and Google should be immunized from liability only to the extent that they comply with these privacy protections. The same should apply to testing facilities and any other entities that supply data to, or extract data from, the COVID-19 database.

Aligning Political Incentives with Privacy Protection. The costs of data collection will be high for the government because our design requires the federal government to bear the costs of all participants’ mobile data plans. In addition to serving as a form of relief to the American public, these costs also create an incentive for the government to end the program and stop collecting data as quickly as it responsibly can.

D. The Test & Trace Laboratory of Democracy

The means of ensuring individual participation in Test & Trace should be left up to individual states, localities, businesses, employers, and individuals (with the understanding, of course, that many Americans will be incentivized to participate in the program in order to receive their free smart watch and get their mobile data service bills paid).

Governors and mayors across America are struggling to decide when and how much to re-open. These are hard questions and the right answer will vary in many ways depending on local conditions. As a practical matter, it may not be necessary to mandate that every resident participate in the program. Social pressure and requirements imposed by the healthcare sector, retail stores, and employers may do enough to encourage the broad participation that would be necessary for an effective program. The more important thing is that, with the infrastructure in place, state and local leaders have the option of requiring participation as a condition for leaving the house. The greater power to issue stay-at-home orders must, logically, include the lesser power to condition relaxation of mass quarantine.

E. Automatic Sunset of Legal Mandates

Perhaps the most important privacy protection is ensuring that all federal and state legal mandates last only as long as the need for the program. The mandate to automatically upload information could expire as each individual acquires generally accepted immunity (e.g., has produced antibodies from a previous infection, if the public health authorities agree the antibodies are likely to confer immunity, or have received an FDA-approved vaccination). But given the scientific uncertainty about the durability and strength of immunity at preventing re-infection or reducing the contagiousness of those who are re-infected, it will likely make sense to continue mandating that such data be logged on users’ phones so that it could be retrieved if necessary. Overarching mandates could expire when one of the following occurs: (a) enough of the population has been vaccinated to achieve herd immunity; (b) absent an effective vaccine, herd immunity is achieved through natural spread; © routine, accurate mass testing is available and effective at containing the virus, as determined by the Oversight Board; (d) the prognosis of the virus is less severe because of treatment or a lucky mutation; or (e) the efficacy of the program proves to be poor.

V. Invitation for Feedback

The virus has run its course for long enough that we now know some of the key characteristics: it is quite contagious, quite deadly, and spreads asymptomatically, and has only just begun to ravage the U.S. Right now, the coronavirus is far more efficient at finding people than we are. Tracking technology can help tip the balance in our favor.

We do not pretend to have all the answers, to have anticipated all the possible futures, or to have fully understood all aspects of this enormously complex problem. We welcome constructive feedback on all aspects of this piece: public health needs, technological implementation, practical alternatives, the return on investment of competing options, and, especially, privacy issues and additional privacy safeguards. We even welcome hostile feedback if it can help illuminate which of our facts or assumptions are wrong. But to stubbornly refuse to engage with a proposal that leverages our impressive communications technologies would be a colossal failure of imagination.

The stakes could not be higher. The virus is very good at finding people. We need to become good at it, too.

The stakes could not be higher. The virus is very good at finding people. We need to become good at it, too.

Fighting COVID With Data is a working group that combines epidemiological, public health, and technical expertise with privacy law. We study how to maximize the value of data to fight COVID-19 while also addressing real concerns about the potential misuse of data. This article was co-authored by:

Jane Bambauer (University of Arizona College of Law)
Berin Szóka
(JD, TechFreedom)
Adam Marcus
(JD, CISSP)
Daniel Barth-Jones (Columbia University Mailman School of Public Health)
James Cooper
(George Mason University Antonin Scalia Law School)

Contact us on Twitter to offer your thoughts or to get involved.

--

--

Fighting Covid with Data

Fighting COVID-19 by unleashing the power of data to trace contacts, recognize patterns of transmission, and magnify all other investments in public health