Web Browser Uniqueness and Fingerprinting
Users are increasingly more online than ever before. Even with the proliferation of smartphones and mobile applications, the web browser still remains the most convenient means of accessing the internet. On one hand this enables every single person, who is connected to the web, to retrieve information on the go. But on the other hand, this convenience has the unfortunate side effect of being used to track users’ every single move online as they hop from one website to the other, and from one link to the other.
Tracking data, which can range from personally identifiable information like names, addresses, to more specific details like location, monthly income, interests and other pieces of revealing demographic details, have been long used by companies and advertisers for various purposes, including targeted advertising.
One of the means companies do this is by storing cookies on a user’s computer. This way the company can track his or her movements, for example, when adding items to a shopping cart, on the website. This is first-party tracking, and of the benign kind. But other kind of cookies, called third-party tracking cookies, and web beacons, referring to the invisible objects that are embedded into a web page or an email (this is why users should refrain from displaying images from unknown senders in emails), are also generally employed to track the user’s activity on the web so as to target them with better contextual advertisements as they browse other websites on the web.
Third-party web tracking, a practice in which the tracker is another entity other than the website directly visited by the user, goes beyond monitoring the user’s visit to the site. While aggregated tracking data is often associated with an unique advertising identifier (and not to a name), it can be counterintuitive when the tracker happens to be a third-party like Facebook or Google (who make most money off ads) that knows your real identity, irrespective of whether the user has logged in or not. This has not only triggered a privacy concern, but has also prompted the European Union to adopt the practice of “informed consent” from users before storing such tracking cookies on the user’s device.
Users who are thus concerned about their loss of privacy have the option of deleting browser cookies before they exit the browser. But as tracking technologies evolve, it’s safe to say that this practice has been less than effective in shielding a user from being tracked. Although they can be mitigated to some extent by embracing “Incognito” or “Private” browsing to automatically clear user cookies, or by disabling cookies altogether and using VPNs to mask online activities, there arises a trade-off in terms of convenience and performance.
None of these attempts at anonymity however prove full protection against a different means of browser tracking called fingerprinting, a widely used technique to track an individual user whenever he visits a website. The information collected in this case includes the type of web browser used (Google Chrome, Mozilla Firefox, Apple Safari, Microsoft Edge, Opera etc.) and its version, the operating system (Windows, macOS, Linux etc.) and its version, screen resolution (mobile, tablet or desktop), supported fonts, browser plugins, timezone information, use of ad blockers, use of “do not track” option, language and font preferences and hardware configuration of the device.
These pieces of information may not be “personally” identifiable per se, but they are comprehensive enough to either fully or partially identify unique users (or devices) even in the absence of cookies. This is so because there is enough entropy (or variation) among the aforementioned variables that only one in several thousands have the exact same combination of specifications as any given user. A fingerprint therefore relies on this slight difference between devices to generate a unique string using the collected information.
Panopticlick Browser Fingerprinting
Browser fingerprinting gained widespread attention in May 2010 when Electronic Frontier Foundation (EFF) published a study called “How Unique is Your Browser?”. By studying over a million visits to their research website https://panopticlick.eff.org, the researchers found that 83.6% of the browsers seen had a unique fingerprint; 94.2% among those with Flash or Java enabled. It also demonstrated that the distribution of its fingerprint algorithm had at least 18.1 bits of entropy, meaning that users had a 1 in 286,777 chance of sharing their fingerprint with another user.
That they were able to arrive at this conclusion by measuring just eight pieces of browser information (plugins, fonts, timezone, supercookies, cookies enabled, user agent, http accept and screen resolution) is of interesting to note. “We implemented and tested one particular browser fingerprinting method. It appeared, in general, to be very effective, though… there are many measurements that could be added to strengthen it,” noted the authors in the paper.
Browser fingerprinting techniques improved further in 2012 with a new type of fingerprinting called Canvas Fingerprinting. Titled “Pixel Perfect: Fingerprinting Canvas in HTML5”, the paper showed how a HTML5 canvas element can be exploited to obtain a consistent fingerprint without even the user’s awareness.
Canvas fingerprinting works by taking advantage of the fact that different graphics cards and different operating systems work slightly differently, meaning that even if different browsers are given the same instruction to draw a particular shape, they will draw slightly different shapes. Researchers Keaton Mowery and Hovav Shacham demonstrated that even rendering the simplest of fonts and shapes can lead to a reliable browser fingerprint.
“Even Arial, a font which is 30 years old, renders in new and interesting ways depending on the underlying operating system and browser. In the 300 samples collected for the text_arial test, there are 50 distinct renderings,” observed the authors after conducting a small-scale study with 294 participants from Amazon’s online crowdsourcing marketplace Mechanical Turk.
When passed an image, the picture rendered on the canvas element can be converted to a string of Base64 encoded data that is in turn passed through a hash function to return a short, fixed length hash value, also called the fingerprint.
Thus when a user visits a website that supports canvas fingerprinting, it issues an instruction to render a hidden text or graphic that is subsequently hashed to generate a unique fingerprint. The fingerprint can be then shared with advertisers to identify users. Since every (re)visit from a user generates the same fingerprint, it can be used to uniquely identify them and monitor their activities in return for targeted advertising.
While the paper only demonstrated that such a type of fingerprinting was possible, it was not until July 2014 that it was discovered to have been used in the real world. In a paper published by researchers at Princeton University and KU Leuven University in Belgium titled “The Web never forgets: Persistent tracking mechanisms in the wild,” 20 distinct implementations of canvas fingerprinting were found by crawling the homepages of 100,000 most popular websites.
It also found that of the 5,542 websites that were using canvas fingerprinting, 95% of them were using code written by a company called AddThis that offers tools for social media sharing. The most troubling aspect here is that neither the website owners nor the users were aware such a tracking mechanism was being bundled alongside a social sharing tool.
AddThis responded to the criticism saying it was conducted as a research project to explore alternatives to browser cookies. “The test was completed, the code has been disabled, and this data was never used for personalization or targeted advertising,” it added in a blog post in an attempt to reassure users that their privacy had not been invaded. Even more interestingly, AddThis said the project was halted because the results were “unreliable” and that identification was not good enough to supplant cookies.
The browser fingerprinting techniques discussed above show that user (or device) tracking is possible as long as he or she revisits the website from the same browser. This ensures that the device fingerprint doesn’t change and remains consistent, making unique identification possible (even if browsed in “Private Browsing” or “Incognito” mode). But the constraint also makes it difficult to compare the fingerprint left behind by a Chrome browser to a fingerprint generated by visiting the same website from the same device but from a different browser, say, Mozilla Firefox.
But thanks to a new browser fingerprinting technique proposed earlier this February by researchers from Lehigh University and Washington University, it’s now possible for websites to fingerprint users even if they use multiple browsers. The paper, titled “(Cross-)Browser Fingerprinting via OS and Hardware Level Features”, shows that the technique not only works across multiple browsers but is also more accurate than single-browser fingerprinting algorithms like AmIUnique.
The fingerprint technique, as outlined above, is derived by improving existing fingerprintable features proposed in Panopticlick, AmIUnique, such as screen resolution (by taking into account the zoom level), font list and number of CPU virtual cores, in addition to proposing new ones that draw on the operating system and hardware of the device.
The algorithm in general remains fairly similar to that of canvas fingerprinting mechanism in that the server (aka the website) sends various rendering tasks like drawing curves and lines to a client (the user’s web browser). These renders, the images and the sounds, are then converted into hashes and sent to the server, which is then combined with a mask to generate a fingerprint. If the technique is used for single-browser fingerprinting, the mask value is all ones, else computed from two different sub-masks in case of cross-browser fingerprinting.
The first sub-mask’s value depends on the browser, for example, the browser’s support for antialiasing, whereas the second sub-mask is generated for each browser pair, for example, Chrome v. Firefox or Chrome v. Windows Edge, based on a brute-force search approach that strikes a balance between cross-browser stability and uniqueness of a feature.
The study collected 3,615 fingerprints from 1,903 users over a period of three months and found that the technique was able to successfully identify 99.2 percent of the users, which is much more higher than the success rate of AmIUnique’s 90.8 percent.
In addition, the researchers also launched a website to demonstrate the fingerprint identification technique. In my own testing on Google Chrome and Apple Safari, the runs produced four different browser fingerprints, one each for Safari and Chrome in regular and Private/Incognito browsing modes, although the computer fingerprints themselves remained the same for a specfic browser.
While the reason for the discrepancy could be due to the fact that the tool is still in development (as noted on the website), the results of the runs can be visualized as seen in Table 1.
Benefits v. Tradeoffs
Browser fingerprinting, by itself, is not an inherently bad practice. As the EFF’s Panopticlick paper mentions, capturing browser information (including precise version numbers of plugins) allows website owners to debug errors in their code and identify issues with browser compatibility that may otherwise slip through the cracks despite extensive testing before deployment.
Fingerprinting can also prove useful when authenticating users to their systems. Banks, for instance, can use this system level information to identify a trusted device and can alert the user if he or she is using a computer that’s different from his or her previous visit. While this could potentially flag a scenario where the user first logs in via Chrome and then subsequently via Safari (as the fingerprints would be different), a technique like cross-browser fingerprinting can come in very handy.
But these techniques, as we have seen in the AddThis case above, can be misused to track users online to serve customized ads without their explicit consent. Advertisers may argue that they want provide users with more useful ads, that it’s better to see a relevant ad (which the user might click) than some random ad that is completely divorced from their web-browsing experience, and that they are not interested in users’ personal identifiable information. The reality, however, is that browser fingerprinting is a powerful technique for tracking users and therefore can have significant implications with regards to user privacy.
A different but more practical approach would be to minimize the amount of information shared with websites and third-parties, and treat fingerprinted information as personally identifiable information. This would mean, as the Panopticlick paper highlights, setting time limits for how long such data can be retained and associated with a user.
Interestingly, the lack of browser plugins also make mobile browsers on iPhones and Android smartphones much more difficult to identify than their desktop counterparts. Apple, under CEO Steve Jobs back in 2010, famously decided not to include Flash on its phones citing security and performance issues. Such a decision no doubt played a role in this case.
Every new technology comes with the promise of making our digital lives better, seamless and more secure. But the potential abuse of the very technology for nefarious purposes is something every concerned party needs to be aware of. In times of near-ubiquitous online presence, browser fingerprinting demonstrates its use to prevent unauthorized access and reduce identity theft, but at the same time its misuse as a user tracking tool for targeted advertising speaks volumes about the significance of setting up tighter regulations over the use of such technology.