Deceit By Design: Zuck’s Dirty Secret He Doesn’tWant You To Know

This weekend Zuckerberg called on governments around the world to herald in a new era of “true data portability,” one where people who “share data with one service [are] able to move it to another.” This “gives people choice and enables developers to innovate and compete,” but also “requires clear rules about who’s responsible for protecting information when it moves between services.” This sounds exactly like the Zuckerberg we knew from 2007 through 2014, when data portability was supposed to be an essential part of Facebook’s commitment to users and developers — and a major reason we all acquiesced in giving Facebook control over our personal data before it gave itself a monopoly over that data when it “locked down” the Platform in 2015.

Remarkably, these statements from Zuckerberg’s op-ed are almost identical to the ones launching Facebook Platform in 2007, where Facebook told the world that people now “have an unprecedented amount of choice” and “can share information and communicate with their trusted connections in ways that would never have been possible before Facebook opened its platform” because they “now have access to a virtually limitless set of applications from outside developers.” So, was the decade-long string of privacy scandals the last time Zuckerberg gave birth to an era of “true data portability” simply the result of a lack of “clear rules”? Is this all really the government’s fault for not adapting to new technology quickly enough? This is, in essence, what Zuckerberg’s op-ed implies. We think this is all humbug and tommyrot. And we can prove it.

We’ve written about how Facebook sold your data as part of a pay-to-play scheme to build its mobile advertising business. In doing so, we noted that Facebook transferred your data without privacy controls, both in secretive APIs and in public APIs. But we haven’t talked about how Zuckerberg weaponized “true data portability” the first time around to design the Platform in a way that deliberately skirted Facebook’s own privacy controls. In other words, we haven’t gotten technical about how exactly Facebook abused your trust and violated your privacy in order to be able to funnel your data en masse to build its monopoly.

This article relies on a technical investigation proving exactly how Facebook did this. It gets to the bottom of how Facebook’s internal systems facilitated the so-called “data breaches” and privacy scandals. It also provides the real answer as to how an app that signs up only a few thousand people can somehow make public billions of private data points. The answer it gives is a complete defense to people like Aleksandr Kogan and others who Facebook has used as scapegoats for its own illegal conduct. (Spoiler alert: the answer is more complicated than just pointing the finger at “friend data,” another scapegoat in Zuckerberg’s fraud.)

Over a decade ago, Facebook made a number of intentional design decisions with its Platform that violated privacy laws and which it never changed, even with all of the tweaks to its privacy pages and controls over the years. The FTC called out a few of the Platform’s design flaws in its 2011 Complaint and 2012 Consent Decree. But the most significant and malicious decision Facebook made in designing its privacy controls has never been brought into the public light — until now. Put simply, Facebook told developers all the data they accessed through Platform APIs was public, but up to 90% of it wasn’t! When we first suspected this back in 2013 and 2014, we had the same reaction you’re having right now: “That can’t be true.” So, we spent weeks coding some software that would get us a definitive and unimpeachable answer. We got the answer. It ain’t good for Facebook.

But before we dive into that, we need to lay some groundwork on Facebook’s privacy controls. When you share your data with Facebook, you can choose from a variety of settings. These settings have been tweaked over the years but at various times have included options to: (1) keep a piece of data completely private (“only me”); (2) share it with your Facebook friends (“friends”); (3) share it with your Facebook friends and their Facebook friends (“friends of friends”); (4) share it with a custom group of people (“custom”); or (5) share it with the world (“public”). We’ll call these privacy controls “Sharing Settings”.

Current Sharing Settings on Facebook

Separately, because of Facebook’s stated commitment to “true data portability” from 2007 through 2015, you could choose whether you and your friends could see your data in apps other than Facebook. This meant that if another software company offered an app that might meet your needs better than Facebook, you could bring your network into that app in a way that would prevent Facebook from gaining an unfair competitive advantage over that app. In other words, Facebook and the app would be situated on a level playing field with respect to the data you owned or had permission from someone else to access. You could also prevent certain apps or all apps from accessing your data. Or you could prevent your friends from accessing your data outside Facebook even though you let yourself do so. In theory, before 2015, you had full control and portability over your data. We’ll call these privacy controls “Portability Settings”.

It should go without saying that Facebook needed to design its Platform in a manner that would at all times respect both Sharing Settings and Portability Settings. We shouldn’t need a new regulatory regime to tell us the obvious. But this idea of Sharing Settings and Portability Settings having to play nice together in a single platform was relatively new and not very well understood by those responsible for policing Facebook. To this day, Facebook has maliciously over-complicated and obfuscated how these two settings actually work (or do not work) together in its Platform. In doing so, Facebook has exploited the technical ignorance and naivety of lawmakers and policymakers to get away with selling your data without your consent.

To make matters worse, when the inevitable privacy scandals ensued, Facebook then used them as an excuse to throw the baby out with the bath water and eliminate portability altogether in 2015 to cement its monopoly over your data. Facebook’s success for the past decade, and particularly its success building a mobile business since 2012, has been predicated upon its ability to weaponize the concept of data portability and then to use its own weaponization as an excuse to eliminate portability altogether. Both the weaponization and the coverup were only possible because of the universal failure of our governments to pierce the technical relationship between Sharing Settings and Portability Settings in Facebook Platform.

The FTC actually got pretty close in 2011 and 2012. In 2011, the FTC filed a complaint against Facebook, which was based in part on a few identified flaws in how Facebook managed its Portability Settings. First, the FTC recognized that Facebook had inexplicably separated the Portability Settings for data you could see about yourself in other apps (“user data”) from data your friends could see about you in other apps (“friend data”). Second, the settings to control user data were on the main privacy page but the settings to control friend data were basically on a hidden page. They were technically accessible through a hyperlink, but you never knew you were supposed to click on that link! Third, the friend data settings on that hidden page were defaulted to “on”. This combination of hiding the settings page and defaulting the setting to “on” was the equivalent of building a massive pipeline that could funnel your data out to the world without you even knowing it.

So, thanks to a very sketchy and deceptive interface design, Facebook tricked people into “consenting” to let their friends access all their data in apps other than Facebook, a huge benefit to certain developers who were Facebook’s largest ad buyers. The FTC found this to be a deceptive and misleading practice and ordered Facebook to stop separating and hiding certain Portability Settings in its July 27, 2012 decision. By the end of 2012, Facebook should have placed all the Portability Settings on the main privacy page. There was never any good reason to separate them in the first place except to facilitate the illicit sale of user data.

While the FTC identified the construction of the pipeline, it completely missed how Facebook built the spigot to control and enlarge the flow of data through it. It’s actually so simple and so bold it’s hard to believe: Facebook deliberately ignored your Sharing Settings in its Platform APIs. In other words, it made you think the Sharing Settings applied globally, but they in fact only applied on Facebook.com and in the Facebook mobile app. So, if you uploaded a photo to Facebook and set the Sharing Setting to, say, “friends,” only people you confirmed as your Facebook friends would ever be able to see that photo on Facebook.com or in the Facebook mobile app. All well and good. But, because you had falsely “consented” to letting your friends access your data in other apps, and because Facebook refused to properly manage your Sharing Settings in its APIs, Facebook would send that photo to a developer as if it were public. Every single time!

In other words, the Sharing Settings and the Portability Settings didn’t work together at all. The developer would never know that your intention was to let only your friends see that photo. The developer might naturally decide to display the photo publicly, because Facebook told the developer it was public! And, of course, the developer would look like it was violating your privacy, not Facebook. This is the movie of Facebook Platform that has played over and over for the past decade. Now, this is a really bold claim. We appreciate that it sounds too crazy to be true. This feeling of “too crazy to be true” has pretty much defined our experience in getting to the bottom of Facebook’s business practices over the past four years. So, let us share how we came to this conclusion, and then how we proved it beyond any shadow of doubt.

In 2013 and 2014, we began accessing various Platform APIs as a registered developer on Facebook Platform. We noticed in the API responses Facebook sent back that there was no privacy metadata associated with a data object. All the data was being shared as “public”. We tried this on a bunch of different APIs. For instance, you can look below to see the actual code Facebook sends back when a developer retrieves a user, a photo album or a tagged photo. You can verify that the responses do not include any information about the user’s Sharing Settings on the photos or the albums in Facebook’s response. (The only modification made to the responses below was to redact all personally identifiable information.) Our next question was simple: “Ok, so all the data Facebook is sending us is being treated as public. How do we determine if it actually is public?”

FRIEND PROFILE INFO
https://graph.facebook.com/REDACTED?fields=name%2Cfirst_name%2Clast_name%2Cusername%2Cgender&access_token=REDACTED
{"name":"REDACTED","first_name":"REDACTED","last_name":"REDACTED","username":"REDACTED","gender":"REDACTED","id":"REDACTED"}
FRIEND PHOTO ALBUM LIST
https://graph.facebook.com/REDACTED/albums?fields=id&limit=1000&access_token=REDACTED
{
"data": [
{
"id": "REDACTED",
"created_time": "REDACTED"
},
{
"id": "REDACTED",
"created_time": "REDACTED"
},
{
"id": "REDACTED",
"created_time": "REDACTED"
}
]
}
FRIEND PHOTO ALBUM INFO
https://graph.facebook.com/REDACTED?fields=id%2Cfrom%2Cname%2Ccover_photo&access_token=REDACTED
{
"id": "REDACTED",
"from": {
"name": "REDACTED",
"id": "REDACTED"
},
"name": "REDACTED",
"cover_photo": "REDACTED",
"created_time": "REDACTED"
}
FRIEND ALBUM PHOTOS
https://graph.facebook.com/REDACTED/photos?fields=name%2Cfrom%2Ctags.fields%28id%2Cname%2Cx%2Cy%29%2Cimages&access_token=REDACTED
{
"data": [
{
"from": {
"name": "REDACTED",
"id": "REDACTED"
},
"images": [
{
"height": 800,
"width": 600,
"source": "http://REDACTED.fbcdn.net/REDACTED.jpg"
},
... OMITTED ...
],
"id": "REDACTED",
"created_time": "REDACTED",
"tags": {
"data": [
{
"id": "REDACTED",
"name": "REDACTED",
"x": 40.0,
"y": 40.0
},
{
"id": "REDACTED",
"name": "REDACTED",
"x": 50.0,
"y": 50.0
}
],
"paging": {
"next": "https://graph.facebook.com/REDACTED/tags?fields=id,name,x,y&access_token=REDACTED&limit=5000&offset=5000&__after_id=REDACTED"
}
}
}
],
"paging": {
"next": "https://graph.facebook.com/REDACTED/photos?fields=name,from,tags.fields%28id,name,x,y%29,images&access_token=REDACTED&limit=25&after=REDACTED"
}
}
FRIEND TAGGED PHOTOS
https://graph.facebook.com/REDACTED/photos?fields=name%2Cfrom%2Ctags.fields%28id%2Cname%2Cx%2Cy%29%2Cimages&access_token=REDACTED
{
"data": [
{
"name": "REDACTED",
"from": {
"name": "REDACTED",
"id": "REDACTED"
},
"images": [
{
"height": 800,
"width": 600,
"source": "http://REDACTED.fbcdn.net/REDACTED.jpg"
},
... OMITTED ...
],
"id": "REDACTED",
"created_time": "REDACTED",
"tags": {
"data": [
{
"id": "REDACTED",
"name": "REDACTED",
"x": 40.0,
"y": 40.0
}
],
"paging": {
"next": "https://graph.facebook.com/REDACTED/tags?fields=id,name,x,y&access_token=REDACTED&limit=5000&offset=5000&__after_id=REDACTED"
}
}
}
],
"paging": {
"previous": "https://graph.facebook.com/REDACTED/photos?fields=name,from,tags.fields%28id,name,x,y%29,images&access_token=REDACTED&limit=25&since=REDACTED&__previous=1",
"next": "https://graph.facebook.com/REDACTED/photos?fields=name,from,tags.fields%28id,name,x,y%29,images&access_token=REDACTED&limit=25&until=REDACTED"
}
}

Before we dove down that rabbit hole, we spoke with other developers who had lots of experience with Facebook Platform. We learned that Facebook’s response to this question, which we’ve seen Facebook give a number of times publicly now, and which it has also echoed in recent lawsuits, is that it handles these issues “upstream,” meaning that Facebook’s internal code weeds out data the developer shouldn’t receive before it reaches the developer. Like most of Facebook’s reactive PR language, this “upstream” response seems to make sense on the surface, but if you spend more than two minutes thinking about it, you realize the response is deliberately ambiguous.

The “upstream” response only makes sense if you interpret it as Facebook weeding out everything except public data. This kind of design would prevent people from fully controlling and porting their networks into other apps, but it would avoid massive systematic privacy violations. In other words, people would only be able to access public data in other apps and, if they wanted a complete experience, they would have to continue to use Facebook. This would make it much harder for companies to compete on a level playing field because Facebook could offer an inferior product and still convince you to use it because it was putting up a wall around some of the data. But if this were true, then there would be no need to include any privacy metadata in the APIs themselves. This design is anti-competitive and certainly doesn’t reflect “true data portability,” but it would explain why Facebook wasn’t sending privacy metadata in the Platform APIs.

But, if “upstream” just meant that Facebook was making sure the user had given the developer permission to access the data (i.e., Facebook was just checking the Portability Settings and not the Sharing Settings), then the developer would still need to know how the user wants that data to be treated, even though the developer has the user’s permission to access that data. Under this interpretation of Facebook’s “upstream” code, the failure to pass privacy metadata and ensure developers keep the metadata updated would result in massive systematic violations of the privacy of virtually every single Facebook user over many years. Surely, we thought, this could not be what’s happening! But it couldn’t hurt to check.

So, we started pulling our own data and the data of our close friends. We could see that Facebook’s API responses were returning more than just “public” data. So, we did something that to our knowledge no other developer has done. We built software that detected whether a particular piece of data was shared with the public or had a more restricted Sharing Setting and then automatically compared that response to the data we received from the Platform APIs. We felt we had to do this because we needed to know if Facebook was basically foisting this massive privacy issue on us that it remarkably seemed to be ignoring in its own code and obfuscating in its public relations. After all, it’s hard to challenge their claim that they deal with it “upstream,” since no one but Facebook can see what goes on “upstream”.

We started with photos. We detected the privacy state of a photo by having an app user’s device attempt to access a photo on the Facebook web site without any access credentials (i.e. as an anonymous member of the public), at http://www.facebook.com/photo.php?fbid=PHOTO_ID_HERE.

If we detected the strings “<title>Content Not Found</title>”, or “cannot be displayed”, we knew the photo was not shared with the public (public == false). If, on the other hand, we saw the strings “fbPhoto”, or “m_photo”, we knew the photo was displayed using HTML Facebook employed at that time (public == true). This information was stored as metadata in a “public” field for each photo in a file called `facebook_photos_detected.json`, along with other metadata like the Facebook Photo ID, the URL of the photo, the Facebook User ID of the photo’s owner, and the names and User IDs of anyone tagged in the photo.

The results shocked us to our core. Out of 58,098 photos Facebook sent us through its APIs in this initial test, only 6,201 were public. Over 89% of the photos — 51,897 photos — had some kind of privacy protection on them. But Facebook never told us this and it was otherwise impossible for us, or any other developer, to know what those privacy protections were!

We ended up having to implement the code we built for this test as part of our production stack in order to prevent the inevitable privacy violations. To put it simply, we had to do Facebook’s privacy job for it. Based on conversations with many other software companies, we suspect that 99% of developers never even realized this issue and just assumed it was all appropriately handled “upstream”. The remaining 1% likely didn’t take the time to build the code to correct for Facebook’s massive flaw because, well, who has the time. If you are a developer and you took the time to do what we did or have data and responses you pulled from Facebook’s APIs that you are willing to anonymize and share, please contact us immediately!

None of this really made any sense to us. Why would Facebook do this? The solution was simple. Just pass the privacy metadata in the APIs and require developers to make periodic calls to keep the metadata updated. Problem solved. True data portability achieved! If any privacy violations occurred from that point on, the developer would clearly be liable for them and not Facebook. It’s a common thing for APIs to require something like this. Facebook’s failure to do this for years had placed us in an incredibly precarious position. We had raised capital, built an impressive machine learning platform, and were ready to go build a real business. And then we learned all our code was built on a quicksand of Facebook’s own privacy violations and that we now needed to spend more time and money to do our best to mitigate the damage from them.

We didn’t figure out what was really going on until much later. It just seemed obvious to us that all developers had the same long-term self-interest in building their businesses on top of a platform that had stable and reliable privacy controls. We learned later on that, for bigger companies in hotly contested software markets, access to more user data and more distribution through Facebook provided an immense competitive advantage that helped them win their respective software markets.

Under these conditions, short-term self-interest around accessing more user data at any cost could win out over the long-term goal of building a business on a stable privacy foundation. The companies close to Facebook who participated in its secretive tying arrangements were much more aware and thus better equipped to manage the risk of privacy violations from the data they had obtained without user consent compared to the remaining 35,000 developers who, like Aleksandr Kogan, assumed they were allowed to access the data Facebook had sent them and didn’t think much about treating it as public because Facebook had represented it as public.

We allege in our case that when Facebook engineers would figure this out, they would report it to their superiors and create “bug” reports for it. We further allege the “bug” reports were never addressed over many years. When pushed by employees, their superiors would disingenuously claim that developers can’t be expected to keep up with Facebook’s privacy settings, that it would somehow be too cumbersome to do so! That’s a very different answer from the “upstream” response and worlds away from providing a legitimate justification for the largest systematic violation of consumer privacy in history.

In the early days of Facebook, this intentionally flawed design made it absolutely necessary for large consumer software companies to build on Facebook Platform. The moment one company started juicing, the rest had to follow suit, or they wouldn’t be running marathons for long. For Facebook, this meant more tentacles in the world funneling more users into the Platform, while at the same time capturing more large buyers for Facebook’s ads. Once the desktop business collapsed in 2012 as smart phones began to dominate, the same intentionally flawed privacy design enabled Facebook to sweeten the pot for the companies it forced to buy mobile ads and give up their own user data.

In essence, the intentionally flawed design enabled Facebook to treat user data as a commodity without Facebook’s own users having a say. All the while, Facebook could use the confusion it had fostered around the combination of Sharing Settings and Portability Settings to place the blame on the speed of innovation, the fact that our privacy and regulatory apparatus hasn’t caught up with the pace of technology, and other lies that seems to make perfect sense until you actually take the time to investigate what’s going on. The very same lies Zuckerberg perpetrated this weekend in his op-ed.

Perhaps most remarkable in all this is that Facebook used this intentionally flawed design to shut down data portability altogether in 2015. By weaponizing “friend data” in this way, Facebook convinced the world that “friend data” itself was the problem, not the intentionally flawed way in which Facebook managed it. This enabled Facebook to wipe out virtually all its competition as it began to execute its strategy of controlling the time you spend on your phone.

Let’s be clear: friend data is not the problem. Friend data properly understood and implemented is a critical component of true data portability. The problem is the intentionally flawed design of the Platform and the years of lies covering it up. People need the ability to transfer their data and their friends — their entire network — to other applications. For instance, people have this ability in Apple’s platform. Without this power and this right, Facebook will continue to abuse its monopoly position and people won’t be able to do anything about it except shut off technology altogether, which most of us will never do.

Zuckerberg calling for a new era of true data portability is no different from the meth dealer demanding the keys to the drug locker. To even have a chance of getting away with it, he has to convince the world he was never a meth dealer in the first place. He has to find a way to cover up years of profiting off the addiction he fostered in every industry. In other words, he has to completely rewrite history. This is why Zuckerberg’s internal chats and his public statements and representations over many years were deleted. This is why on the eve of Parliament’s publication of a small number of documents from our case, Facebook removed the policy it had been using for years to arbitrarily and punitively destroy other software businesses. This is why Facebook’s favorite PR response these days is “we’re a very different company today than we were in 2016, or even a year ago.”

But rewriting history requires more than just creating a void. The void needs to be filled. Cue Zuckerberg taking to The Wall Street Journal to finally explain Facebook’s business model 15 years on. Cue Zuckerberg’s grand gesture only weeks later on how Facebook was shifting that business model he had just finally explained towards privacy and encryption. Cue Zuckerberg’s urgent plea this past weekend in The Washington Post for the government to regulate the old (and the new?) business models. It’s not clear. Facebook seems to be changing so fast it can’t even keep up with its own strategy. I can only imagine what it would be like for the people at Facebook who do the real work if they actually had to steer a new course for this half-a-trillion dollar ship every few weeks based on whatever the ship’s captain decided to write in a newspaper that day.

But don’t worry: nothing is actually changing. This is all a public relations game with a singular goal: rewrite history faster than the governments of the world care to preserve and remember it. If Facebook succeeds, it may just come out the other side of all these government investigations with only a few billion dollars in fines and a new regulatory regime it can exploit just like the last one — an incredibly small price to pay for maliciously selling the entire world’s most personal information without any privacy controls for more than a decade!