Covi-ID: Contact Tracing in Emerging Markets

16 min readJul 9, 2020

This is the second article in a series of three. Here, you can find the first article on “How Technology can Help Fight COVID-19 in Emerging Markets”. In this article, I discuss five important technologies to fight COVID-19 and how they can be used in emerging markets. In the third article on “Has South Africa given up on contact tracing?”, I go into more details about the lack of contact tracing in South Africa.

Here, I want to give an update on Covi-ID, a privacy-preserving free and open source health credential management system tailored for resource-constrained countries. Covi-ID allows users to create personal data wallets in which they can record geolocation receipts issued by a verifier such as a security guard, taxi operator, or teacher — someone tasked with safeguarding public health. Access to the wallet is provided through a QR code which can be either stored on a user’s phone, or printed. By using the Enigma SafeTrace trusted execution environment, Covi-ID can guarantee a strong notion of user privacy. Feeding data of COVID-19 positive patients with their consent into the PathCheck Foundation’s COVID SafePlaces platform makes Covi-ID an integral part of an efficient COVID-19 response in resource-constrained countries. All our code is open source and online on github.

TLDR;

Research shows that, for contact tracing to be able to curb the spread of COVID-19, you need to identify 50% of the positive cases and trace 60% of their contacts in less than three days. This makes manual contact tracing, in particular in resource-constrained countries, extremely challenging.
Consequently, several tools have been identified as necessary for an efficient fight against COVID-19. Among them, five stand out: Symptom screening, exposure notification, contact tracing, hotspot detection, and health credentials. I discuss these in more detail HERE.
Technology implementing these five tools already exists, but relies on smartphones or on privacy-invasive centralized systems. However, smartphone penetration in resource-constrained countries is low and only about 50% of South Africans have a smartphone.
As citizens become increasingly concerned about privacy, the legality of privacy-invasive solutions will be challenged and their adoption further limited. Public health authorities frequently emphasize the need to prioritize the efficiency of crisis response measures over user privacy. However, this is a false dichotomy, as I argue here.
Instead, here I give an update on Covi-ID, a privacy-preserving free and open source health credential management system. Users can create personal data wallets which can be accessed through a QR code which can be either printed or stored on a mobile phone. Personal data wallets can hold geolocation receipts that are issued by verifiers like security guards, taxi operators, teachers, and anyone tasked with safeguarding public health in a physical space, using the dedicated Covi-ID verifier app.
The advantage of the Covi-ID platform is that it is radically inclusive by enabling even users who do not have smartphones to access their personal data wallets simply by showing a QR code to a verifier.
The Covi-ID platform guarantees user privacy by using the Enigma SafeTrace trusted execution environment, which builds on Intel’s Software Guard Extension. This ensures that user data cannot be used in any way without active user consent. In particular, the Covi-ID makes it impossible to sell, lease, or share user data. This safeguard against abuse and catastrophic failure greatly reduces cybersecurity- and legal risks.
The system I propose is uniquely suited for public health officials in resource constrained countries due to the integration of Covi-ID with the COVID SafePlaces platform for manual contact tracers. The system is completely free and publicly available on github.

An Efficient Privacy-Preserving Toolkit to Fight COVID-19

The Edmond J Safra Center for Ethics at Harvard University has outlined a number of useful principles for digital tools to combat COVID-19 in their papers on “Outpacing the Virus: Digital Response to Containing the Spread of COVID-19 while Mitigating Privacy Risks” (Hart et al. (2020)) and their paper on “Immunity Certificates: If We Must Have Them, We Must Do It Right”. These papers are concerned with contact tracing and health certificates specifically, but the principles outlined therein apply to all five tools discussed above. For technology-based contact tracing using bluetooth, Hart et al. (2020) recommend a system where most data is resident on the consumer device and with de-identified data stored on a server only when using random tokens and secure keys. Google / Apple Exposure Notification (GAEN) would clear this threshold, although it is not clear how Bluetooth low energy (BLE)-based contact tracing is interoperable with other forms of contact tracing. For GPS-based contact tracing, Hart et al. (2020) recommend options where all personal data resides on the consumer device only.

Given that many consumers in emerging markets do not have a smartphone that would allow them to store data on their device, I argue that the most natural extension of these principles is to use a data storage system that places full control over personal data in the hands of the users. Specifically, and due to their sensitive nature, a user’s geolocation data should be controlled by the user. In other words, the user should be able to control who has access to her data and for what purpose. And once the data has been used for a specific purpose, the user should be able to revoke access without loss of privacy. When data resides on a user’s phone, this is a little bit easier to achieve, albeit far from trivial. Executing arbitrary code over data stored decentralised is still not possible. Luckily, in the case of contact tracing, finding possible exposures can be done in a decentralised fashion.

The good news is that even when the data is stored in a central data lake it is possible to adhere to the privacy principles outlined above, as long as data is controlled by the users. This means a simple SQL or PostgreSQL database is not sufficient and instead alternative means of data storage are necessary. I will discuss one such means, a trusted execution environment, below.

Figure 1: A centralized data storage model where code is executed on joint data, owned by the database operator.

In the context of COVID-19 applications, two data storage models are prevalent; A centralized data storage model (Figure 1) in which all data is stored in a single place controlled by a central database operator (e.g. Department of Health or, worse, a private contractor). The database operator automatically becomes the owner of data stored in this fashion because ultimate control over the data — residual control rights in economic parlance — resides with the database operator. In the ``self-sovereign’’ data store (Figure 2), users store data themselves, usually on their smartphones (or on so-called data pods, provided e.g. by Tim Berners Lee’s Inrupt). This model is recommended by many privacy advocates and, with some additional security features, is also at the heart of GAEN. In this model, users own their data until they share it with a third party.

Figure 2: Decentralized data storage with code executed on individual data controlled by their owners.

The problem with the centralized data layer in the context of privacy is that there is no privacy for users because a database operator can, at any time, make a copy of the user’s data — unbeknownst to the user, and in the case of cyber security breaches possibly even unbeknownst to the database operator. It is important to emphasize this point: even if the data is kept “safe”, i.e. is tightly controlled by the database operator, there is no privacy for the user because she is not able to “limit the extent to which information about her is shared with others” as per Westin’s definition.

The self-sovereign data store on the other hand has two problems. First, it severely limits the type of computations that can be executed on the data. While it is, for example, feasible to compute the total number of COVID-19 positive cases if the data stored in such a way is a user’s health status. However, if the goal is to find out if a user has been in contact with a COVID-19 patient, this data store is only feasible if users automate access to their data. On a smartphone this might amount to actively approving that a contact tracing system checks data stored locally. Not only does this require a lot of communication between the device and a central system controller (e.g. to communicate all new beacons of COVID-19 positive users in the past 24 hours), it also requires users to actively consent to the computation. The moment users automate the consent, they turn over control over their data to an algorithm and thus forfeit their privacy. And second, the self-sovereign data store only allows privacy until users provide their data to any third party. The moment this happens, the data is beyond the control of the user and there is no more privacy. In other words, revoking access to the data is not possible because data can be easily copied once a third party has access once. Legal restrictions exist to prevent the abuse of data, but a lack of policing and enforcement of these legal restrictions severely impede this model of data control.

Consequently, a data layer for privacy-preserving COVID-19 applications requires a different data storage model. For this purpose, I propose to use trusted execution environments (TEEs) such as Intel’s Software Guard Extension (SGX) for which Enigma provides the SafeTrace API. A TEE is an area within a computer’s main processor that is separated from the rest of the computer to ensure that code and data stored within a TEE is tamper-proof (See Figure 3). Data passed to the TEE is encrypted with the TEE’s public key so that it is never transmitted in plain text and can only be accessed within the TEE. The TEE then autonomously executes the prescribed code on the data sent to it from the users and returns the outcome, again encrypted, to a third party. A TEE can prove to the users which code has been executed so that, user consent presupposed, the ownership over the user’s data is maintained.

Figure 3: Privacy through a trusted execution environment (thick dark line) which prevents undue code being executed on the data stored within the TEE.

This feature enables a different privacy model where users can store data in a centralised data store, but only grant third parties access to clearly defined and unchangeable APIs. If these APIs include a deletion function, users can even revoke access to their data. There are two limitations to a TEE. The first is that a TEE can currently only store up to 4GB of data and code. This is more a hypothetical limitation than a practical one because very very few applications exceed this limit. The second limitation is that TEEs can be hacked, just as any computer system can be hacked. Some researchers have proposed algorithmic solutions to privacy that would overcome this issue, but for the time being a TEE does not fully eliminate the cyber security risk of centralised databases.

A trusted execution environment provides a flexible data store to facilitate all relevant applications mentioned above. Here, I use Covi-ID as an example to outline how a privacy-preserving toolkit to effectively fight COVID-19 can look like.

Covi-ID is an open platform for COVID-19 tools developed at the University of Cape Town in conjunction with South African entrepreneurs. At its core, it is a privacy-preserving data store for all data relevant in fighting COVID-19. Covi-ID provides an open source API deployed within a trusted execution environment. For this example I have used the enclave provided by Enigma’s trusted execution environment which builds on Intel’s SGX. The main differentiating factor of the system outlined in this paper is that it is open and inclusive, i.e. it can be deployed as a complement to existing systems, not as a replacement. Consequently, Covi-ID can augment existing manual contact tracing efforts and render them more effective.

Given the typically low smartphone penetration rate in a resource-constrained country like South Africa (check out the the blog post I wrote about privacy-preserving technology to fight COVID-19 in emerging markets here), I am cognisant that various tools have to work together with each other and with existing systems, e.g. for manual contact tracers to effectively fight COVID-19; Here, I outline four user journeys to describe the different pieces of an efficient toolkit.

User Journey 1: Signing up to the platform

Appropriate data management is the foundation of any efficient system. Users can generate a personal data wallet through the free and open source Covi-ID web app (see Figure 4 below). For this, a user goes to https://app.coviid.me or any other website that hosts the web app (which could be hosted on premise with a public health authority) and enters her first name, last name, phone number, potentially ID or passport number (depending on the jurisdiction) and upload a picture. She then receives a one time pin (OTP) via sms to this phone number to give active consent. Once the OTP is entered successfully, the Covi-ID API creates a personal data wallet, which is an entry in a centralized database which is encrypted with the private key of Enigma’s Trusted Execution Environment to implement the data storage layer discussed above. The QR code generated for the user can either be sent to the user’s phone, printed e.g. in an internet cafe, bank branch, or post office, or even pre-printed. For this last flow, a user starts the collection of personal information by having her QR code scanned by a verifier (more on this below), and then enters her personal information on the verifier’s device.

Either way, the Covi-ID platform connects a user’s personal data wallet with a unique identifier, embedded in the QR code. This identifier is encrypted with the TEEs private key to ensure that only the TEE is able to process any information related to a user’s unique identifier.

Figure 4: The user journey to generate a personal data wallet using Covi-ID.

User Journey 2: Receiving geolocation receipts and medical credentials

The second user journey assumes a user already has her personal data wallet and the QR code necessary to access it. In this journey, shown in Figure 5 below, a user can show her QR code to e.g. a security guard or taxi operator who then uses the Covi-ID verifier app, scans the QR code and issues a geolocation receipt to the Covi-ID API, consisting of the geolocation and timestamp obtained from the verifier’s smartphone. The Covi-ID verifier app uses the TEEs public key to encrypt all data in transit; The payload is delivered to the TEE which then decrypts the geolocation receipt, as well as the unique identifier of the user embedded in the QR code. The TEE then adds the geolocation receipt to the user’s personal data wallet which it can access because the personal data wallet is just an entry in a database that is encrypted with the public key of the TEE.

This setup ensures that no code other than what the user agreed to can be executed by the TEE. All possible ways how user data can be used are hard coded into the TEE and cannot be altered without the user’s consent. This implements a notion of privacy without hampering the core functionality necessary to fight COVID-19. At the same time, the data store can also record a user’s health status (e.g. whether she has recently been tested for COVID-19 and what the outcome was) as well as a history of her symptoms. This is important in countries where symptom tracking is part of the COVID-19 response. The combination of various types of data is not possible within the GAEN framework where privacy is predicated upon anonymity.

Figure 5: Receiving a geolocation receipt from a verifier who uses the Covi-ID verifier app.

User Journey 3: Testing positive for COVID-19

A more complex user story is the case where a user tests positive for COVID-19. This is where all components of the system need to come together to effectively curb the spread of the virus. The flow is depicted in more detail in Figure 6 below and assumes the existence of a manual contact tracing process. Such a manual contact tracing process is set up e.g. in South Africa where the National Department of Health manages contact tracing information and similarly exists in many countries worldwide. Given the low smartphone adoption rate in emerging countries, manual contact tracing is still the most prevalent method of contact tracing. Health Authorities coordinate contact tracing efforts, often using only manual contact tracers who conduct contact tracing interviews with patients who test positive for COVID-19. In some cases, these manual efforts are already supplemented with technology, e.g. by using cell phone geolocation data obtained from mobile network operators (MNOs).

In this setup, a user gets tested, e.g. by being referred to a test laboratory. If the test is positive, and since COVID-19 is a reportable disease, the laboratories already report every positive COVID-19 case to a central register. In South Africa this sits with the National Institute of Communicable Diseases. Once reported, manual contact tracers start their job, which usually entails calling or otherwise interviewing the patient. With Covi-ID, the manual contact tracer starts the interview as she normally would. The additional step is that she can now query the Covi-ID API using the patient’s name and/or contact number. If the patient has a Covi-ID, the manual contact tracer then automatically triggers a request for user consent. The Covi-ID system sends a one time pin (OTP) to the user which the contact tracer needs to enter for the Covi-ID API to consider consent as being given and then queries the user’s data. This data can include a history of healthcare information, such as self-reported symptoms, but most importantly, it includes the geolocation receipts collected by the user over the past three weeks. These geolocation receipts are then forwarded by the Covi-ID API to the SafePlaces platform where a manual contact tracer can use e.g. https://safeplaces.africa to reconstruct the geolocation history of the patient from geolocation receipts.

Figure 6: User journey when a user tests positive for COVID-19 and the information is relayed to manual contact tracers.

This is an important ingredient in the contact tracing interview which is used to create context around individual exposures. For example, a contact tracer can ask whether a certain location was indoors or outdoors. Via the Covi-ID platform, organizations can register verifiers to issue geolocation receipts. When registering, the organization can provide additional information about the location, such as whether it is indoors or outdoors, and whether social distancing measures can be enforced. This greatly helps the contact tracer to differentiate between different types of exposures, something that a pure GAEN app cannot do.

Based on the patient’s geolocation history, the contact tracer can then create a hotspot map of COVID-19 cases, using COVID SafePlaces. This is a list of geolocations and times which is then shared with the Covi-ID platform. Covi-ID will then use Enigma’s SafeTrace to identify potentially exposed users by accessing their geolocation receipts. It is important to emphasize that the Covi-ID platform, using Enigma’s TEE, cannot do anything else with user data than what is described here. The data remains encrypted and cannot be decrypted by Covi-ID or anyone else except the TEE. This prevents any abuse of the sensitive personal data. The hotspot map created in this way can then also be used to inform government policies regarding disinfection of hotspots, or possibly targeted closures of hotspots.

Interoperability of different solutions.

A privacy-preserving system for contact tracing can easily be integrated with less privacy-sensitive solutions such as geolocation via cellphone triangulation, as shown in Figure 7. I use the example of South Africa, but similar systems exist in other countries. This is the most complex and most realistic case, where several systems co-exist. The advantage of using COVID SafePlaces is that it can act as a platform to collect, clean, aggregate, and then disseminate critical information from and to other systems. In Figure 4, in addition to Covi-ID and geolocation receipts as data source, I also show how geolocation data collected via cell phone triangulation (e.g. for users with feature phones) can feed into the system. Furthermore, by using COVID SafePlaces and COVID SafePaths, the proposed system would automatically include exposure data collected via BLE for smartphone users.

Figure 7: A holistic system of contact tracing and how it could be set up in South Africa.

In such a holistic system, data is collected in step (0) at three points: First, via geolocation receipts through the Covi-ID platform, feeding into Enigma’s TEE. Second, low resolution data is recorded automatically into a national data lake hosted at the Department of Health through cell phone triangulation with the help of MNOs like Vodacom and MTN. And third, exposures among smartphone users are recorded using COVID SafePaths. Now assume User A gets tested, possibly because symptom screening indicated she might need to get tested. The test is done in a laboratory and in case the test result is positive, the lab reports the result to the national manual contact tracing platform (or another location designated by a national health authority).

Once a user tests positive, the manual contact tracing process starts. For this, the HA contacts user A and asks her to volunteer her data in step (4). In the example below, User A has a smartphone and is using COVID SafePaths. She volunteers her geolocation data, as well as exposures recorded via BLE to the contact tracer who uses COVID SafePlaces to aggregate this data into a hotspot map. These hotspots are then communicated to the COVID SafePaths system as well as to Covi-ID in step (5). COVID SafePaths automatically checks whether there was an exposure (either through geolocation or from the BLE exposure list) and notifies the user in accordance with the appropriate notification protocol. This can be an automatic notification on the phone, or a more manual process. At the same time, the Covi-ID API receives the updated hotspot list and uses Enigma’s SafeTrace to find exposures in step (6). These exposures are then returned to SafePlaces and the manual contact tracer in step (7). Users who do not have automatic exposure notification (like User E) are then notified in step (8).

This holistic system provides the most extensive user coverage to ensure a critical mass of users is included in contact tracing and exposure notification efforts.

Conclusion

A toolkit to effectively fight COVID-19 globally inevitably must address the low smartphone penetration of resource-constrained countries. I outline five technologies that have proven to be effective in fighting COVID-19 in different settings: symptom tracking, contact tracing, exposure notification, hotspot maps, and health credentials. Not only must these tools work together to be most effective, they also need to be widely accessible, which in resource-constrained countries implies that they need to work for users without smartphones.

The interface between the tools mentioned above is often not well defined and the existing framework, in particular in resource-constrained countries is patchy at best. To address this shortcoming, I have developed Covi-ID, a privacy-preserving free and open source health credential management system. The Covi-ID platform gives users without smartphones the ability to register a personal data wallet that can hold geolocation receipts. These can then be fed into the COVID SafePlaces platform which provides a unified backend for data collection from various sources and an ideal tool for officials tasked with safeguarding public health.

This whitepaper shows that an effective toolkit to fight COVID-19, even in the most challenging environments, does not have to compromise on user privacy. Instead, guaranteeing privacy, interoperability, and inclusivity are cornerstones of an effective response to COVID-19.