Should Contact Tracing Apps Use Your Location?

Ian Varley
The Startup
Published in
15 min readMay 13, 2020

This week, two million Australians downloaded a COVID-19 app, and India has mandated a contact tracing app for its entire working population. Many other groups are working on apps that will make use of the new APIs from Apple and Google. The age of “COVID apps” is squarely upon us.

The proper name for most of these apps is “Exposure Notification”¹, because that’s all they can do: tell you if your phone was near another phone owned by someone who turned out to be contagious. Notably, this protocol does not use location data in any way; the system doesn’t know where you are, it just knows that you were within a couple meters of some other (anonymous) person who later tested positive.

This sharp division (between exposure notification and location tracking) was fully codified this week, as Apple and Google clarified in no uncertain terms that apps won’t be able to use the location APIs at the same time as this new BLE exposure notification API. This puts a damper on a number of important apps being developed, such as MIT’s SafePaths app, where the team has written a thorough exposition of why it prefers to combine BLE with location.

Is this the right choice? It depends. In this essay, I’d like to examine why (I think) Apple and Google have taken this strong stance, and then ask whether that’s the right choice or not. Are there valid uses for location in exposure notification apps? And what are the implications of doing so?

Why Is Location Data Controversial?

First, let’s be clear about one thing up front: location data is inherently dangerous. If you don’t keep location data private, it’s a privacy nightmare.

Why? Because high-accuracy location data can be used to both figure out your identity (by identifying your most common location as your home) and then also connect that with everything else you do–who you meet, where you worship, where you shop, etc. And as I’ve written previously, bad things happen when this private data isn’t properly protected; people can have their lives ruined, or even be put in mortal danger.

Location data is also difficult to properly anonymize (see this great example if you’re not sure why); it’s a very rich data source, and even if you think it has been carefully redacted and blurred, there’s always the chance that it’s still amenable to attack. (As another example, see this Wired article on how fitness data on Strava shows the location of military bases, and more.)

BLE data, conversely, is much sparser; it tells you that two phones were in proximity at some time during a day, but that doesn’t let you easily determine who those phones belonged to (especially since the identifiers change constantly throughout the day), or where this encounter took place. While there are plenty of concerns around BLE data (like forgery, false positives, etc), if you look at it strictly from a privacy standpoint, the mass compromise of BLE data wouldn’t be nearly as catastrophic as the mass compromise of detailed location data.

This is why we’re seeing this consensus from so many privacy groups. This collected research doc says, “In terms of privacy protection, the best case scenario is: decentralized Bluetooth-only like TCN, PACT, DP3T, and the upcoming Google/Apple APIs.”. Likewise, this open letter from academics states: “Bluetooth-based solutions for automated contact tracing are strongly preferred when available.” And the ACLU bluntly states: “The use of data that is difficult or impossible to anonymize (such as location data) should be avoided.”

Is There A Case For Location Data?

Now let’s look at it from the other perspective. What good could come from tracking location (assuming that you can do it in a privacy-preserving way)?

It’s important to distinguish between two very different activities: recording location data (locally on someone’s device) and sharing location (centrally or publicly). You can record location data without sharing it, and this still has important uses, while being much better for privacy².

Recording Location

There are two main benefits of recording location data locally on your phone: having context for exposure notifications to avoid false positives, and assisting manual contact tracing.

Context For Avoiding False Positives

Let’s talk about false positives.

What’s an example of potential false positives with BLE data? I’ll volunteer one. My bedroom at home faces onto a street with moderate pedestrian traffic — a few dozen people walk by on an average day. And my phone sits on my bedside table, easily within Bluetooth range of anyone walking by. Does that mean I’ll be marked as “exposed” anytime someone contagious walks by outside my house?³

False positives like this are everywhere, in the eyes of a BLE signal. People sitting in the car next to us at a stoplight; people eating in the restaurant next door; people living on the floor above us in an apartment building.

So now, consider what it might be like to get an alert from a BLE contact tracing app. Your phone buzzes, and you see an alert like this: “You were exposed to someone with COVID-19 on Tuesday; you should self-isolate, and contact your doctor if you’re having symptoms”.

After freaking out a bit, you might have some other important questions. Like:

  • What time was this exposure, where was I, and what was I doing then?
  • Was I wearing a mask? Were the people I encountered wearing masks?
  • Was I really at risk, or is this a false positive (e.g. we were in nearby cars, different apartments, etc.)?
  • Was I by myself, or was I with companions (e.g. family members) who were also exposed?

If you can’t answer these questions about your context, you’ll inevitably fill in the gaps based on your own mental state. Some people might assume they’re definitely infected (which may be good for public health overall, but produces more false positives and economic impact). Others might skew optimistic, thinking “Well, I was wearing a mask all day, and I don’t remember bumping into anyone, so this is probably a false alarm,” which might mean missing real exposures.

It would be much better if you could actually answer these questions, by having (local to your phone) more details about when and where each encounter happened. Then, instead of a notification saying “You were exposed”, the notification could say “You may have been exposed; please answer some questions to determine your risk level.” And then, the questions could be things like:

  • Were you in the presence of anyone outside your own family at this time?
  • Were you wearing a mask?
  • Was everyone around you wearing a mask?
  • Did you wash your hands after this encounter?

The answers to these questions would then tell you whether the exposure was likely to be a transmission event, in which case you should isolate.

(Remember, in these scenarios, we’re not just talking about our current pandemic reality where everyone is strictly quarantined; we’re talking about a hopeful future where the level of infected people drops significantly, and people actually go out to restaurants and coffee shops again; at that point, COVID-positive reports will be both less frequent and more noteworthy, so being flooded with false positives is theoretically less of a concern).

Does tracking location increase the privacy risk for someone transmitting a positive diagnosis? Yes it does, and that’s the rub; it would mean that people you encountered now know specifically where and when they encountered you, which the protocol from Apple and Google does not allow (it only records the day and duration of exposure, not the exact time). This means that in some subset of cases, they’ll be able to deduce who exposed them (if, for example, you had a one-on-one meeting at that time). This is a tradeoff we should think about carefully.

(And, note that generally speaking, if you know that you were contagious when you encountered someone, the ethical thing to do is to inform them of that; that doesn’t mean, though, there aren’t times when you might ethically choose not to do so, such as if you had reason to believe that they might retaliate against you.)

So, context is both important … and problematic. On one hand, BLE encounter notifications without this context might erode public trust due to a high rate of false positives. On the other hand, BLE notifications with this context might (rightly) cause people to second guess how private they’ll remain if they submit a positive diagnosis, and had one-on-one interactions with people who they would prefer not to learn of their diagnosis.

Assisting Manual Contact Tracing

Another solid reason to store location data locally is to assist with manual contact tracing. While digital contact tracing and exposure notification apps have promise, just about everyone agrees that we also need a massive investment in “old school” contact tracing: the kind where trained medical professionals interview a diagnosed carrier, and then call up anyone who they might have come in contact with. This is a slow process, and only as good as the memory of the person being interviewed. (I don’t know about you, but I’d have trouble recalling exactly what time I went to a coffee shop 5 days ago.)

For this reason, it’d be useful to have a “memory aid” app that stores my location data (locally). If I do contract the virus, and talk to a contact tracer, I can quickly scan through the timeline view of such an app and say “I was at the coffee shop from 1:52pm until 2:40pm”, etc. This is both faster for contact tracers, and more correct than just relying on my own memory.

Now, of course, lots of people already have this capability with apps like Google Maps or Gyroscope. But these commercial apps are decidedly not privacy preserving; they store your location data centrally on corporate servers. You might trust these companies to keep your data private, and that’s your choice. But that’s very different from asking the entire population to do so. (In an early conversation I had about contact tracing, one of the first sentences out of a coworker’s mouth was “I will never give Google my location data, period.”)

Note that BLE data alone is totally useless to manual contact tracing, for precisely the reasons I mentioned above: it doesn’t know who you are, or where you’ve been, nor does it know “who” you’ve been in contact with–only the cryptographically secure random numbers that their phones have broadcast to you. For contract tracers, there’s simply nothing there to work with.

Sharing Location

The above examples don’t require location data to leave your phone, ever. But there are some plausible benefits from sharing location data, if it can be done in a privacy-preserving way.

Aggregate Data

One thing that shared location data can do is inform your general understanding of risk level in a slightly larger radius, such as on a county-by-county level. This kind of information, usually displayed in a heat map, is a signal I can use to modify my behavior (“Hey, let’s cancel our lunch plans, the virus has been spiking near here lately, and I’d feel safer staying home”).

Heat maps also have utility to society at large; local businesses can use them when considering whether to reopen, or where to allow service technicians to travel to a certain part of town. Health officials can use them for trending, to understand hospital and equipment readiness levels. Government officials can use them to fine-tune their enforcement of various safety practices (like mask wearing, social distancing, etc.)

A heat-map-based approach doesn’t have the same “chain-breaking” power as contact tracing or exposure notifications. But it does have an edge in terms of the adoption it requires. With exposure notification, you get the negative effect of fractional multiplication; if 12% of the population installs an exposure notification app, then you’ll only catch 1.44% of actual encounters (0.12 * 0.12 = 0.0144), because both sides of the encounter have to have the app for it to work. Conversely, with a heat map approach, if 12% of the population installs the app, you still get a good representative picture of infections in each geographical area.

Aggregate location data doesn’t have to be collected at high accuracy; it’s fine to record data with an accuracy to the nearest 1 kilometer (or for more rural regions, perhaps 5km). The specifics of this require thought and research, but the general idea is sound.

Note, on the other hand, that aggregate location data doesn’t need to really collect actual location at all; asking someone who has symptoms to simply supply which county or zip code they live in gets the same result, if they’re not traveling. So this isn’t really a strong argument in favor of collecting data with a GPS (other than the fact that doing so is easier and more reliable than asking someone to enter it).

Location-Based Exposure Notification — Yea Or Nay?

So, here’s an even more controversial question: can (or should) we use shared location data for exposure notifications, in the same way that BLE encounters are used?

How would it work? The basic idea is similar to the BLE approach, except that instead of registering proximity by direct messages between phones, we’d register proximity based on GPS trails that overlap. The app would record my GPS location locally (just like it would when assisting with manual contact tracing), without sharing anything. Then, if I test positive, my app could upload my last 14 days of location history (not attached to any other personally identifying info) to a central server, and then other users with the same app could download that location history and cross-check it against their own location data to see if we crossed paths. This is essentially the original form that the MIT PrivateKit app took.

There are two big obvious challenges for this approach: accuracy and privacy.

Accuracy. The first big challenge, compared to a BLE approach, is that the accuracy of location services are much worse than BLE. Commercial location services (including GPS, Wifi SSID vectors, etc) work well enough for navigating your car down a street, but they cannot accurately tell whether you were standing close enough to someone to be exposed to the virus. At best, location data can (mostly) accurately tell you that you’re within tens of meters of someone. (It’s even worse in urban canyons like Manhattan, as this Foursquare article points out).

This level of accuracy is not granular enough for direct, person-to-person exposure notification. In the words of this collected research doc, “Research has demonstrated that solutions based on sharing geolocation (i.e., GPS) to discover contacts lack sufficient accuracy.” And this makes some intuitive sense; the fact that someone was contagious in a 1-block radius of me does not mean that they had any chance of infecting me; for a transmission, we have to actually share some airspace (within a couple meters) at the same time.

Bluetooth, conversely, is specifically designed to only work at short distances; it simply can’t make big errors (like saying we were close, when we were really a block away from each other), because the signal doesn’t stretch that far. (It additionally has an attenuation measure (i.e. signal strength) that gives you a secondary clue about how close to someone you actually were, which can be used in risk calculations.) So while there are other ways to get a false positive with BlueTooth (like sitting in neighboring cars at a traffic light), it tells you a lot more accurate information about how physically close two people are than GPS does.

Privacy. The second challenge is that if location data is even in the neighborhood of accurate enough for exposure notification, that means it’s also probably accurate enough to figure out who you are. Shared location data, in general, is extremely privacy-challenging.

The MIT SafePath app relies on a medical professional to manually redact personal information (like your home location, work location, secret lover’s location, etc) before sharing. If this sounds error prone, you’re right; humans make mistakes. It’s also a one-way door; once you’ve shared the data, if you later realize you missed a category of locations you’d rather keep private (like your place of worship), it’s too late.

It’s plausible that more automated ways will be designed to redact this information in the future–for example, using AI. If you’re always at the same place overnight, chances are that it’s either your work or your home. This is speculative, but could turn out to at least speed up the work of a health professional in doing redaction.

Another option for preserving privacy is using encryption. There are (very cool) cryptographic techniques, like private set intersection, that can be used to compute if two encrypted GPS trails crossed, without revealing to either party the complete set of GPS coordinates. Another scheme, private proximity learning, allows two users to learn that they’re close to each other, without revealing any other location data.

This isn’t a panacea, however. The first problem is scalability; you might be able to compare two encrypted paths, but as you increase the number of paths, it requires a lot more computational power (as this paper on similar techniques for comparing address books shows). This might be computationally infeasible if you’re trying to compare hundreds or thousands of paths; clever solutions are likely possible, but that’s not possible with the current state of the art.

Further, any technique that supports finding an intersection in a secure way necessarily must be used as a protocol (rather than a dataset you can access in its entirety), because you need to limit the total number of interactions; otherwise an adversary could attack it by brute force, querying for every point on earth. Thus, this requires strong server-side protections if you collect this data, in a way that other datasets (like BLE encounters) don’t.

And sadly, these two methods (redaction or encryption) are pretty much the only tools we’ve got if our goal is discovering when two people may have interacted. You can’t simply blur the locations or upload lower resolution data, because accuracy is the whole point of exposure notification.

Which is why, at this time, exposure notification using location isn’t the top choice for most apps, and many in the industry are adamant that we should not use location data to perform automated exposure notification.

Surface Transmission?

By the way, it has also been suggested that catching surface transmissions (via fomites) would be a good case for using location-based exposure alerting instead of BLE. For example, if someone sneezes on a doorknob that I touch a few minutes later, BLE encounter data certainly won’t show that as an interaction, whereas location trail data plausibly could.

But it seems that this isn’t practical; as that same combined research doc says, “with COVID-19, surface and aerial transmissions occur, but too diffusely to contact trace; instead, this risk needs to be mitigated by general sanitation (hand washing, avoid touching faces) and cleaning of venues where cases were confirmed.”

The Message Matters

One final important consideration around location data is perception.

As we’ve seen in the response from various leaders, and in surveys of people’s willingness to use contact tracing apps, there’s already a big perception problem with exposure notification as a concept, because people don’t trust that tech companies are really going to protect their privacy. (And, let’s be honest: they’re not wrong, the industry doesn’t have a great track record.)

So simple, black-and-white statements about what an app does might be a big advantage, in terms of uptake. It’s much more effective to say:

“This app doesn’t record your location, period.”

Compared to:

“This app records your location, but doesn’t share it unless you choose to, and then only using privacy preserving techniques”.

If you have to say all of that, what the general population hears is that you doth protest too much, and that you’re probably putting them under surveillance. No thanks.

So from that perspective, regardless of the advantages of actually using location data, and the prospects for doing so safely, the categorical decision by Apple & Google makes some sense. If they can’t get the public (and elected officials) on board with this technology, the whole issue is moot.

Conclusion

So on the whole, while I get the rationale from Apple and Google on the choice to disallow location services, I ultimately think that choice should be left up to the individual health authorities, not mandated a priori as a part of the platform.

In particular, the ability to store location data locally in an app (without sharing it) would help avoid false positives, and allow people to use the same app for both BLE encounters and manual contact tracing.

But at the same time, this is uncharted territory, and I do appreciate Apple and Google for erring on the side of tighter privacy.

In any event, we won’t really understand the consequences (both intended and unintended) until we see these apps in the real world. Until then, here’s hoping that whatever form they take, it gives us traction in getting us back to normal faster.

I’m Ian Varley, a software architect living in Austin, TX. I am not a trained epidemiologist, nor do I play one on TV; I’m just a concerned layperson trying to help people sort through a mass of information. The statements in this article are entirely my opinion, not the opinion of my employer. As much as possible, I’ve tried to include only statements that are backed up by other sources, including those listed below. If you see any misleading or inaccurate information, please comment!

Footnotes

¹ Props to Harper Reed, who was I think the first person to make this naming distinction between “Contact Tracing” and “Exposure Notification / Alerting”.

² Yes, private data on your phone can be taken by force, demanded by government officials, etc. But this is true of all your other private data too, like photos, text messages, etc., so it’s not specifically a downside of contact tracing apps; as long as app usage is voluntary in the first place, people can judge their own risk of these things happening, and act accordingly. For people in western democracies, the overwhelming majority can rely on the fact that data on your phone is private as long as the software you’re running is designed to keep it so.

³ There will likely be other ways to avoid false positives built into these apps, such as the ability to “snooze” or “pause” recording interactions during certain hours. (It would also be nice to “geofence” interactions, but you can’t do that without location services either.)

Acknowledgments

Thanks for early feedback to Aaron Taylor, Ignacio Manzano, Shivan Sahib, and Tom DuBois.

References

--

--