Google’s Privacy Sandbox Dilemma (part 2/2)

Published in

Weborama

6 min readNov 17, 2021

In the first part of this article, I explained why blocking Device Fingerprints is mandatory for Google if they want their Privacy Sandbox to be a success. In this second part, I’ll describe why I believe this is actually a dilemma and one of the most challenging problems of the Sandbox.

The problem with FLoCs (for Google and its Sandbox)

The fingerprinting situation really gets interesting when you look at FLoCs.

To replace user-centric behavioral profiling based on third-party cookies, the Sandbox aims to provide an API called FLoC (Federated Learning of Cohorts). The idea is simple: each Chrome browser will locally compute a “FLoC ID” based on their navigation history in such a way that two browsers with a similar history would end up in the same FLoC ID. That ID is guaranteed by design to have k-anonymity.

But what if the IP address is still there? Well, if you have the IP and the FLoC ID at the same time, you might already have a pretty solid fingerprint. Combine it with the browser’s major version and the operating system and you could bet the fingerprint you get is pretty unique and stable for some time (until your FLoC changes basically, hence not before at least 7 days).

So, by adding FLoCs, Google would add to the fingerprinting surface while the goal of the Privacy Sandbox is precisely to reduce it. Sounds like a paradox.

Recent discussions implied making the FLoC sticky to websites (so that, among other things, a fingerprint built using the FLoC would not be the same from one domain to another), but that would at the same time degrade the business value of the FLoC (behavioral profiling would not be up-to-date on all domains visited by the users, hence campaign performances would decrease).

All this to say that it seems as long as the real IPs are out there, an alternative to cookies will persist via fingerprints.

So Google has no choice but to get rid of the IP before deploying FLoCs… Or they should cancel that FLoC functionality from the Sandbox, and would then degrade the open-web viability a bit more.

We could conclude that Google has to get rid of the IP if they want the Sandbox to be a success.

If not, fingerprints will persist and cross-domain identifiers will continue to exist in an even worse environment: you cannot “delete your fingerprint” as easily as you can delete (and explore) your cookies. It’s a passive, invisible and silent ID that sticks to your navigation. It is by many means way worse (privacy-wise) than a plain old third-party cookie that you can see and purge whenever you want.

But they have planned on that, you will tell me, look at Gnatcatcher!

Yes, that’s the point.

Getting rid of the IP?

How do you do that? The web is built on the HTTP protocol. The IP address is a key element of that protocol, it’s a core component of the way our devices communicate with websites.

You just cannot remove it, it’s like saying I want to get a product delivered without having any postal address. Not possible. You have to be somewhere for the product to get to you. Same thing with the web page you’re requesting with those funny videos of cats playing piano.

To solve this problem, Google has planned to implement a massive network of proxies used by default by the Sandbox.

If you cannot get the IP out of the protocol, you can still bounce from one to another before getting to the website you’re visiting. Think of how VPNs or Tor work. It’s not exactly the same, but clearly, the concept is close.

The feature is called Near-Path NAT (which is basically a way to emulate passive IP blindness on to the servers operating websites).

With Near-Path NAT, your Chrome instance will stop accessing the web directly but would behave as if it has proxies hardcoded: it will ask a relay to get the data you want for yourself, hence keeping your IP hidden from the web server (only the relay knows it, and the web server sees the relay’s IP in place of yours).

The execution problem there is twofold:

the scale: we’re speaking of all Chrome instances on the planet, hence approximately 69% of the desktop web traffic and 63% of the mobile web traffic.
the performances: for this to work, so-called proxies have to be as close as possible to web users, otherwise, because of boring speed-of-light reasons, the navigation speed would get hurt dramatically.

Ideally an IP privatizer service like this would be operated as close to on-path as possible, so as to minimize the performance impact of increasing the round trip time. ISPs are in a strong position to operate such a service and in fact some ISPs NAT the majority of their traffic for other reasons.

You can see in this extract that Google’s Privacy Sandbox team is not yet sure how they will proceed, they acknowledge that they need a wide network of proxies to operate Near-Path NAT for performance reasons, and they imply this network would be run by a third-party, not by Google.

As a matter of fact, we can easily understand that no matter the power and scale of Google, they cannot operate this infrastructure themselves, at least because of the huge monopoly situation this would cause.

Imagine if all Chrome instances on the globe got routed through Google servers when they accessed the web.

You get the picture?

It’s not the Internet anymore, it’s Googlenet.

Routing all the traffic of Chrome browsers on the planet through such an infrastructure would mean swallowing all of the requests made by Chrome users on the globe. They would be in a technical position where they could limit or block any kind of resource they want on the web. They could literally spy on everything that is done online, even when there is absolutely no link whatsoever with their properties.

They would pretty much be in the same position as your ISP: seeing all of your traffic, and being able to limit your bandwidth usage depending on what you’re accessing, if they wish.

It’s obvious they cannot operate that proxy network, and I’m also certain that it’s the last thing Google is thinking about, for those very reasons.

The dilemma

So here we have our dilemma (or Google’s).

The IP has to disappear, or the Sandbox will fail.

For the IP to disappear, an infrastructure of a global scale is needed to route ~65% of the web traffic.

Google cannot operate that infrastructure without transforming the Internet into the biggest Extranet we’ve ever seen.

The Mozilla Foundation, in their report about the FLoC implementation, doubts Google will even manage to mask IPs:

Some information, like IP addresses, is fairly unique, stable, and difficult to remove. The best available techniques for removing IP addresses involve proxies or VPNs, which incur additional cost. Lassey has proposed a proxying technique called “near-path NAT” that is intended to reduce the cost, but to our knowledge there are no extant deployments and so operational feasibility is unknown.

This cost is not to be neglected: look at Apple (clearly a spearhead regarding Privacy measures with Safari), they provide a similar approach to mask a user’s IP since the last update of iOS / OSX, called iCloud+ Private Relay. The interesting thing here is that this feature is not given for free, you have to subscribe for a paid iCloud account to get it.

Masking the IP is not cheap, even for Tech giants.

A Dilemma?

At this point, we believe this is the biggest challenge Google is facing with the Privacy Sandbox development.

Either they manage to make the IP disappear, but that will be at the cost of altering what we call the web. Or they don’t, and then, it will not be a Privacy Sandbox, it will be a Privacy Sieve.

Google’s Privacy Sandbox Dilemma (part 2/2)

The problem with FLoCs (for Google and its Sandbox)

Getting rid of the IP?

The dilemma

A Dilemma?

Written by Alexis Sukrieh