Undead cookies: How the web never forgets

HTTP cookies are so 1995. A web cookie (yum) is a small piece of data sent from a website and stored in your browser. You may not know it but if you fire up your network inspector and check out your request headers, cookies are sent with EVERY request.

Screenshot from Chrome going to Wikimedia.org site on HTTP Cookie (yeah, very meta)

The intended purpose by the creator, Lou Montulli, was to use them to store state on each user’s computer since HTTP is fundamentally stateless. The general case was for a shopping cart, how do you remember what’s in there unless you can “track” the products.

I’m Not Dead Yet, Cookie Respawning

Two decades later, we are no longer tracking products but people, or more specifically the browser you are using. In response, people have “cleared their cookies” but that’s only the tip of the iceberg. Samy Kamkar published the Evercookie library that not only sets a cookie, but actively resists deletion by copying itself in different forms on your machine and resurrecting itself like some privacy stealing zombie if it notices that some of the copies are missing or expired. Here are the thirteen different places it can shove that identifying data:

  1. Standard HTTP cookies (yawn)
  2. Locale Shared Objects (Flash cookies — not so popular these days)
  3. window.name caching (2–32MB but gets cleared easily)
  4. Silverlight Isolated Storage (good old Microsoft)
  5. Internet Explorer userData storage
  6. HTML5 Session Storage
  7. HTML5 Local Storage
  8. HTML5 Global Storage
  9. HTML5 Database Storage via SQLite
  10. HTTP ETags
  11. Web History
  12. Canvas fingerprinting

HTTP Etags, Web History, and Canvas Fingerprinting

The last three places are so deviously creative, it brings a smile to my face. ETags are used for HTTP web cache validation to let your browser know if the content needs to be refreshed or not using a collision-resistant hash function to fingerprint the asset. In July 2011, a bunch of researchers at UC Berkeley discovered that Hulu.com and KISSmetrics were using these unique “fingerprints” not just for content but for tracking browsers.

If web history caching is enabled, any data can be base64 encoded, for example let’s use the identifier of “hi” which would be encoded as “aGkNCg==”. Javascript code would then access the following URLs in the background:

google.com/evercookie/cache/a
google.com/evercookie/cache/aG
google.com/evercookie/cache/aGk
google.com/evercookie/cache/agkN
google.com/evercookie/cache/agkNC
google.com/evercookie/cache/agkNCg
google.com/evercookie/cache/agkNCg=
google.com/evercookie/cache/agkNCg==
google.com/evercookie/cache/agkNCg==-

When checking for our identifier, the code would loop through every possible Base64 character on google.com/evercookie/cache/, starting with “a” and iterate through every possible character until it reaches the ending “-”. You can read more about this on Jeremiah Grossman’s blog.

Canvas fingerprinting made waves in 2014 from the paper “The Web Never Forgets”. The browser leaks site has a great page to calculate your unique signature by using the HTML5 Canvas tag. This works because the same canvas image may be rendered differently on different computers due to:

  • different web browsers,
  • different image processing engines,
  • image export options,
  • compression levels,
  • final checksums may differ even if pixel-identical,
  • operating systems have different fonts, and
  • different algorithms and settings for anti-aliasing and sub-pixel rendering.

Lower-level protocol identifiers

Good news for us, web browsers have found ways to mitigate and defend against most of the “evercookie” type of storage techniques. Unfortunately, people have a lot of free-time. The Chromium team published this article which pretty much says anything your browser does to communicate with the outside world is traceable. This includes:

  • Origin Bound Certificates (aka ChannelID) persistent self-signed certificates that identify the client to an HTTPS server
  • Supported set of ciphersuites can fingerprint TLS/SSL handshakes
  • TLS has session identifiers and session tickets
  • Long-lived HTTP Strict Transport Security (HSTS) headers
  • DNS caches to store small amounts of data information (8–9 cached hostnames would be sufficient to uniquely identify every computer)
  • Network configurations like client IP, TCP/IP and TLS stack fingerprints, ephemeral source port numbers, local network IP address behind NAT’s, X-Fowarded-For proxy headers, and with active probing for list of open ports on the local host.

Device fingerprinting

You love your mobile device or Macbook Pro and religiously clear your cookies and only use “safe browsing”, but your browser is leaking enough identifying bits of information to create a unique device-specific fingerprint. Without leaving cookies behind, the EFF (panopticlick.eff.org) can use the following browser characteristics to identify you out of a line-up:

  • screen size and color depth
  • browser plugin details
  • time zone
  • HTTP_ACCEPT Headers
  • language
  • system fonts
  • platform
  • USER_AGENT
Panopticlick results for Firefox

When device finger-printing is used in combination with other methods, you can only hope on the goodwill of sites not employing this technique and that a unicorn will magically appear and grant you a magic wish.

The Undead Network

One last parting gift, if 3rd party cookies are allowed, you’re in a heap of trouble since you can use cookie syncing (simpler explanation here). In brief, even if you secure everything about your browser (Tor) and use it within a vanilla VM, once you visit one advertising site in the network any other site you visit within that network (which can often be thousands of major websites) will be able to track you.

Sleep well tonight!