The next data scandal

6 min readApr 15, 2018

Hardly a week goes by without a data scandal. The biggest story right now is Facebook and Cambridge Analytica, where the data of an estimated 87 million users may have been improperly accessed. The core of the story is not a data breach in the traditional sense, but a failure by Facebook to prevent others mining their users’ data. Breaches are bad but they are usually due to an oversight, a bug, a glitch. As humans (and it usually is a human factor that causes a breach) we make mistakes. Designing systems that allow data mining, by having loose safeguards on APIs, or inviting partners to access personal information is different. Once a third party has the data, it’s uncontrolled. When Facebook learned that an authorised app had extracted customers’ data and shared it with other parties, including Cambridge Analytica, they demanded those companies certify they’d destroyed their copies. It appears Cambridge Analytica lied to that fact.

During the same period Facebook were in discussion with medical institutions to match health data with social and economic factors from anonymised user profiles, presumably without users’ explicit permission. The project never got past the planning stage and may never continue after the Cambridge Analytica scandal broke.

WhatsApp, a platform whose main selling point is secure messaging, has its own weakness in exposing data. Group chats can expose the messages you write, linked to your phone number. All a person needs is a URL that a group originator can share, to find with a simple web search. Anyone joining the group is not required to identify themselves.

Another recent controversy is the case of Panera Bread. This story could have been the classic breach, with Panera Bread leaving the back door wide open. Dylan Houlihan, a security researcher, discovered the vulnerability and warned the company. Unfortunately Panera Bread appeared to ignore the warning. For 8 months. The company had left an unauthenticated API open to the world. Anyone with a user’s ID could harvest personal details. To further inflame the problem their user IDs are sequential so you don’t even need to get lucky, merely increment a number to mine every user in the database. Following a slew of negative press from the the security community, media and customers, they finally started taking their responsibility seriously and took their website and APIs offline. Panera Bread now appear to be addressing this, and other security issues, but this episode will test their ability to recover.

Last year a software service provider, [24]7, suffered a malware attack, which exposed the credit card details of customers of Delta Airlines, Sears department stores and other large US brands. The breach happened in September 2017 but [24]7 did not notify the affected companies until mid-March 2018. It’s not known what caused the delay in notifying their clients.

Data breaches occur all the time. Like the tip of an iceberg, there are many we never learn about. I’ve experienced some that top level management have covered up. Some are never discovered. What interests me is how companies design data accessibility into their architecture without safeguards needed to ensure they are not the next headline. I’ve had my own discussions with companies about how they handle user data. The latest being Twitter, who offer multi-factor authentication but do not send the vital SMS code, and Pinterest, who offer no way to download or delete your personal information.

How we use and store our customers’ personal information is, or at least should be, at the fore. The European General Data Protection Regulation (GDPR) lands on 25 May 2018 with punitive fines, but reputation damage can destroy a company. Personal information is the world’s most valuable commodity and because of that your data is being shared on a scale never seen before. I work for a company where we deal with users’ location data and we have the responsibility to control and only use that data in ways the user, the owner of the data, would approve. I wouldn’t have it any other way. That means secure storage, non-disclosure, ensuring it’s kept accurate, deleting it in its entirety when asked or is no longer needed. It takes work and not inconsiderable cost but it is our our responsibility as guardians of the data.

Who will be next?

All the news about breaches, malpractice, carelessness made me think: who is likely to be the next major data scandal? When it comes to a hack it could be anyone. There are many undiscovered vulnerabilities, un-patched servers, opportunities to leave the back door open. More interesting is, who is likely to architect their own downfall? Who words their policies to be permissive in their own favour, opens up their users’ data to external apps and advertisers?

LinkedIn comes to mind. The target of many memes of someone, stuck on a desert island, spots a message in a bottle only to find it’s a job ad from LinkedIn. A section of their User Agreement states that you provide LinkedIn “[a] worldwide, transferable and sublicensable right to use, copy, modify, distribute, publish, and process, information and content that you provide through our Services, without any further consent, notice and/or compensation to you or others.” Their API Terms of Use were last revised on February 12, 2015 (at the time of writing), which predates many recent data breaches and bad press that has triggered policy changes in other Silicon Valley companies. Their API appears straightforward, an app developer gains permission from a LinkedIn user with OAuth2 and can then access data related to that user. Then there’s this vague, open-ended line in the Scope and Intent section:

Note, that upon prior approval, LinkedIn may enable you to access more LinkedIn data than is generally publicly available via the APIs.

It feels like a familiar story. As if we’re waiting for the announcement that a shadowy entity duped us, syphoned off our data and are subjecting us to targeted propaganda and identity theft. How easy would it be for someone to create an app that uses your LinkedIn credentials to pull profile, employment history, education, connections?

LinkedIn have already had their own breach, where an attacker compromised 167 million accounts. It not only affects LinkedIn user accounts but helps a hacker build a picture to target other websites and services with secondary attacks. The value is in what they can derive from a your information, especially when combined with other data sets. The business of advertising, propaganda and identity theft is concerned with vacuuming up massive amounts of data, which is exactly what’s happening with increasing frequency and consequence.

Changing landscape

In the face of all the negative press, Mark Zuckerberg seems contrite. Facebook are busy repairing damage by offering a data abuse bug bounty, partnering with non-profit researchers to study the effect of social media on elections, making political ads more transparent, reducing the information they expose and making their privacy tools easier to find. The question is, are they doing enough or is the damage permanent? It’s okay to have the choice of audience you show your posts and likes to but it means nothing if you have no control over what advertisers and app developers can access. There are calls from Mozilla Foundation and others to break up Amazon, Apple, Facebook and Google. Senators have accused Facebook of being a monopoly as Zuckerberg, ironically, has his privacy invaded by the scrutiny of journalists’ cameras.

As customers, we give up our details without too much thought. Over time terms & conditions and privacy policies change, the scope of use could expand, allowing your data to be used in ways you didn’t expect. I wouldn’t expect anybody to perform due diligence before using a website or app but I suspect the vast majority of people never read the terms & conditions or privacy policy either. The digital landscape is changing and we need to be more aware of how companies use and share our private data. More services are being offered free at the point of use and companies aren’t going to voluntarily regulate their use of your data. GDPR will help in a large part. The strict regulations it imposes for data of EU residents may translate to non-EU residents as it’s simpler for companies to apply blanket policies. For example, a company may enable all its users to download their data if they have to do it anyway for EU citizens under the right to data portability.

Individuals must own their data.

The next data scandal

Who will be next?

Changing landscape

Written by Ben Mullard