Kevin Bankston, Caroline Holland, Harlan Yu, Michelle De Mooy, Prof. David Vladeck at New America Foundation, Washington D.C. April 5, 2018.

How to Fix Facebook’s Data Leak

Dave Troy
Dave Troy
Apr 6, 2018 · 6 min read

One of the most challenging aspects of the recent revelations about Facebook’s controversial data leaks is that much of the uproar is centered around events that occurred over three years ago. It is difficult, at this point, to put the horses back into the barn and change what happened between 2010 and April 30, 2015, which is when Facebook’s Graph 1.0 API could be used by developers to extract data about the friends of people who may have consented to use of various applications.

Many have argued that Facebook should have acted far sooner and more decisively to sanction Aleksandr Kogan, Cambridge Analytica, and its parent company, SCL — to at the very least block them from the platform and make its policies clearer and more user-centric.

However, Facebook’s leadership dragged their feet, and generally did what they could to make this story go away. From their point of view, they did nothing wrong, and if there has been wrongdoing, it is on the part of researchers like Kogan and firms like Cambridge Analytica who handled data extracted from Facebook in ways that violate its terms of service.

However this interpretation no longer passes a sniff test for much of the public — especially in Europe, which has long had stricter data protection regulations than in the United States, and which will adopt new General Data Protection Regulation (GDPR) legislation in late May 2018.

It is in this context that the U.S. Federal Trade Commission and Congress are evaluating what actions they should take to effectively regulate Facebook going forward. Facebook itself is adopting several measures aimed at preventing abuses in the future, however most of these proposed changes still don’t address the data leakage that has occurred in the past.

But here’s something that might work.

A New Deal for Data

One of the most powerful features of Facebook is the ability to target specific users with specific messages. Indeed, this is how Cambridge Analytica (and many legitimate advertisers) have reached audiences with incredible precision. Facebook offers many tools for targeting specific user demographics right on its own platform: you can select attributes like income, location, device type, job type, and interests. That’s really the power of Facebook’s platform, and used properly, it is an incredible tool for advertisers.

But to really drill down into very specific audiences composed of many specifically targeted individuals, it is possible to upload your own sets of individual Facebook User ID numbers, and target those people with advertising. And you may not know it, but every Facebook user has a unique User ID number. It’s like your Social Security number, but for Facebook. Mine is 741590524. Using just that number, an advertiser can target me with whatever content they like — ads, promoted articles, or apps.

Indeed, my user ID number has been the same for the entire time I have been on Facebook—over ten years. This means that over time, that number may have been harvested in any one of a number of contexts by any combination of app developers, advertisers, hackers, and data brokers, and with a range of intentions. Undoubtedly it appears in data sets used by Cambridge Analytica and others. Indeed, that number is out of the barn and can’t really be put back in.

But what if that number simply stopped working, and could no longer be used to target me with ads on Facebook?

What if, given what we know now, every Facebook user were given a fresh start, and people who did not intentionally consent to have their data harvested by third parties could no longer be targeted by any advertiser using their old Facebook User ID number?

I proposed this recently, and raised the idea at an event yesterday at the New America Foundation in Washington, D.C. called Facebook After Cambridge Analytica: What Should We Do Now?, where it sparked significant interest among those in attendance, including U.S. Federal Trade Commissioner Terrell McSweeny.

Here is how this might work for Facebook:

  1. Announce deprecation of legacy User ID numbers. The company notifies developers and advertisers that legacy User ID numbers will be no longer be usable in 60–120 days time.
  2. Adopt new User ID numbers for all users. This would be a relatively simple matter of just assigning every user a new unique number.
  3. Create a legally unambiguous migration process. Entities which have agreed to Facebook’s new terms of service would be given the opportunity to migrate their access to previously authorized user ID’s to new user ID’s under the new terms of service.
  4. Permanently disable all legacy User ID numbers. This might include tracking attempts to use invalidated ID numbers for ad targeting, and investigating companies that attempt to do so to ensure that they are in compliance with Facebook’s new data practices—and to temporarily or permanetly ban non-compliant firms from the platform.

This approach is not without its problems. There will, of course, be some incentive to salvage the value in the “old” legacy (pilfered) data by trying to create some sort of mapping from the old numbering scheme to the new one. And the migration process will necessarily make this possible; however, Facebook can impose sanctions that would disincent that activity and ban entities found to be doing so improperly.

Data brokers would be targeted specifically by this action. Invalidating the old user ID’s would immediately render datasets bought and sold by brokers useless. Those attempting to revive old datasets could potentially be banned from Facebook, and offending brokers could be fined.

This process would intentionally pose a bit of a nightmare for Facebook. They would have to manage a slew of changes in their developer and application ecosystem which would be costly to manage. But Facebook should face costs. And those costs should not just be fines and civil penalties, they should result in real changes for consumers. Advertisers and developers who play by the rules and don’t use Facebook’s most advanced targeting features might see very little change; the company’s internal systems would simply be re-linked using the new ID numbers.

Another concern might be that this problem will just play out again, but this time with the new user ID numbers. Yes, that is a concern, however, the change made way back in April 2015 was by far the worst loophole exploited. With that hole plugged and a set of improved policies in place, it will take quite a while for things to get as bad as they are right now.

Because of the changes made in 2015, it is simply not possible now to extract a new, comparably large corpus of data that could be readily correlated with the old data —so the old datasets will eventually suffer from bit-rot and become unusable.

And there are other possible improvements: periodically deprecate user ID numbers, or have all access to user data expire automatically. We have a lot of options going forward, but making this change now presents a significant, if imperfect, speed bump for anyone wishing to use improperly obtained data.

If we really want to address our most acute problem, which is that Facebook is making our democracies vulnerable to manipulation using unethically obtained data sets in a major election year, deprecating these old datasets seems like a practical and immediately actionable option.


Postscript: Facebook allows uploading of email addresses and phone numbers in addition to user ID numbers to create custom audiences. It’s my guess that the biggest attack vector is currently via User ID numbers, but certainly there would be unscrupulous players who would aim to try to use other identifiers. My assumption is that this renumbering effort would accompany much more strict controls on the use of alternative identifiers. I think this approach is a step towards a possible solution, not a panacea.


Dave Troy is CEO and co-founder of 410 Labs, a data analytics firm based in Baltimore, Maryland. He is also curator of TEDxMidAtlantic in Washington, D.C.

Dave Troy

Written by

Dave Troy

Lover of cities, designer, thinker, writer, entrepreneur, TED speaker, and data visualization geek. Comments? Contact me at davetroy@gmail.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade