Fixing Privacy in Location-based Advertising

A group of researchers at University of Washington recently published a very interesting study on ADINT (advertising-based intelligence collection), which became the basis of the story by Andy Greenberg at Wired — “It Takes Just $1,000 to Track Someone’s Location with Mobile Ads.”

The gist of the matter goes like this (#TLDR):

  1. A user’s Advertising ID (cookie for browsers, or e.g. IDFA for native apps) can be easily correlated with a specific person using traffic sniffing or intelligent ad tags.

To some extent, this was the location-oriented version of the work done by Alexandra Korolova, who demonstrated how micro-targeted ads violated user privacy on the Facebook platform (which Facebook later fixed).

Why is this important?

The advertising industry is in a constant pursuit of a balance between two competing objectives: (a) improving relevancy of ads shown to specific users, and (b) ensuring privacy of those very users.

Let’s analyze this a bit.

Anonymity of Cookies

The Web is built around cookies. The very notion of a cookie (or IDFA for that matter) exists to establish a basic level of anonymity, i.e. with the premise being that the cookie being a randomly generated number, there is no practical way for an attacker to re-identify the actual person by looking at the cookie alone. The advertising industry’s position has been that targeting to a list of specific cookies is not a privacy issue because those are depersonalized by nature. Except…

Except that the barrier to establish a deterministic connection between a cookie and the individual happens to be surprisingly low for a skillful person and does not require special-purpose tools — basically, a script kiddie can do it.

This means that the security threat model of the advertising ecosystem (and consequently the engineering, data handling, and business practices) need to be revisited with a fundamentally different assumption:

The supposedly anonymized cookie is a sufficiently personal identifier that can be tracked back to a specific individual.

Targeting Precision

Whatever the level of cookie anonymity really is, the ability of an attacker to target a single person via a an intersection of demographics segments and location or by specifying the cookie explicitly, is a critical part of the problem. So, if the advertiser wasn’t allowed to configure a campaign to target effectively a single user, then there would have been no problem, right? Almost.

What if the advertiser specifies 2 cookies? Is that good enough? What about 3? What about 5? At what point is an audience size large enough to ensure user anonymity?

Reporting Granularity

Campaign targeting defines who the ad impressions are served to, however where the real intel collection happens is in reporting.

Let’s say a campaign targets 4 people, of which two are males, and two are females. Male #1 is 30 years old. Male #2 is 50 years old. As long as the system provides reports broken down by gender and age bands, or allows the advertiser to generate reports for a particular combination of characteristics, the potential attacker can collect information about a specific single user though the campaign itself was targeting 4. The logic here can be extrapolated to a larger number of users, where using a combination of multiple attributes or behavioral segments, the analyst can narrow down the data set to the person being tracked.

The granularity of (a) campaign targeting and (b) campaign performance reporting are equally important with regards to the privacy of consumers receiving ad impressions, and have to be analyzed as a single complex.

Location Privacy

Technically, to an advertising system the raw location is just a pair of numbers known as “latitude” and “longitude” (or the “lat/long” pair) that is processed to understand if the user visits a particular venue or geographical area. While logically these can be abstracted as yet another segment or attribute that the user belongs to, and that can be analyzed by the attacker, there is definitely a high level of sensitivity specifically about user location. You are where you go. Use your imagination.

When an advertising platform receives user location from one of the user’s apps or websites that the user visits, this happens because the user either explicitly agreed to (opted in for) or didn’t mind (didn’t opt out from) sharing of their location. So, this is it, right? They agreed to share their location, they had a choice, they knew what they were doing, issue closed? Not so fast.

Certain online services and mobile operating systems leave the user no practical choice by connecting continuous location sharing with availability of highly desired services or features. Have you seen any of the map applications insist that you enable continuous location sharing only to enable you to see the last few searches you’ve done in it?

In effect, mobile users are often cornered into agreeing to share their location.

Whether it’s an operating system or an application that asks the user to share location, there’s always either an explicit or an implicit promise that user’s privacy will be protected. The usual terms for this are “depersonalization” and “aggregation”, where the promise is that specific location of an individual user will be not be shared with a third party.

Beyond that, there’s a common sense “Do No Evil” assumption by the consumer, whereas the service operator is expected to know what they are doing, act in good will, and protect the consumer’s interests.

The bottom line here is: it is not sufficient for the service operator (e.g. a social network platform, or a DSP) to simply give user the choice of not providing their location. Location privacy requires specific engineering and data handling practices purposely thought through and built into the system.

k-Anonymity to the Rescue

To properly address the issue, we need to have an idea about the level of anonymity we need to provide and balance that with the level of details in the reports that professional advertisers have come to expect in the recent years. I.e., do we report every single user, or groups of 10, or groups of 100, or 1000? This is known as the k-Anonymity question.

“A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appear in the release.”
— Wikipedia.

In other words, at the time of campaign targeting OR report generation, if any combination of filtering parameters (e.g. gender, age band, time of day, location) produces a group of users with the size smaller than k, the result is reported as “not enough data”.

So, how big is k, really? Different countries and businesses will have different requirements, however from the practical standpoint, a value of 10 can be assumed as the minimum.

Here’s what it means to implement k-anonymity in a typical advertising system:

  1. No campaign configuration will be accepted if its projected target audience is smaller than k.

Were these principles applied in the platforms chosen by the University of Washington researchers, it wouldn’t have been possible to identify individual user location, as providing such information would violate the k-anonymity principle.

Location: Mutually Exclusive Granularity

Do we need to do anything specifically about location targeting, though? Yes, we do.

The level of targeting or reporting where one can see specific locations of an individual user at specific points of time is, let’s be honest, nothing but user tracking. We do not need individual user location tracking for digital advertising to work.

Advertisers typically look at large numbers within a particular context. Behaviors of each individual user are not interesting and analyzing those is prohibitively expensive for any business. What matters is trends among large groups of users, or within specific locations and venues.

As an example, if the advertiser wants to serve ads to users within a particular geographical area, large or small, they won’t care about gender and age of each individual user, or at what specific point in time, or what specific coordinates within the area each user is located. What they need is statistically significant data they can act on. “What percentage is male vs female?” “How did that change with each hour?” “What are the trends?” Nowhere in this discussion is a question of tracking individual consumers.

Overall, advertisers do understand privacy, and generally agree to sacrifice granularity in one dimension in order to understand the others better. I.e., if they want to know precise location, they don’t need to know specific time or specific users. If they do want to know specific time and location, they understand if users are grouped into statistical buckets, and in some cases results can’t be produced due to privacy. This is what we call “mutually exclusive granularity”.

The exact engineering and architectural aspects of processing precise user location while preserving consumer privacy is a topic for a separate post. However, k-anonymity and mutually exclusive granularity are the fundamental pieces. It is definitely technologically achievable. In fact, that’s exactly what we’ve done at Cinarra. When there’s a will, there’s a way.

Take Away

  • Fundamental explicit and implied assumptions behind the current location advertising tech, specifically around user privacy, are being challenged.

Thanks for reading!

Founder of Cinarra. Entrepreneur and technologist. Fitness enthusiast. USA, Russia, Singapore, Japan.