The 2010s will be remembered as the first decade in which we, the people, paid for the pleasure of welcoming Big Brother into our lives. When George Orwell depicted an inescapable surveillance state — telescreens in every room monitoring every move, recording every sound, and reporting it all to the authoritarian leader — in his classic novel 1984, he probably never imagined that in 2018, folks would pay $600 (plus a recurring monthly fee) for the privilege of carrying a telescreen in their pockets.

China’s surveillance apparatus includes facial recognition technology connected to an expansive network of CCTV cameras and to camera-sunglasses that police officers wear; soon it will be connected to a flock of drones disguised as birds. The Chinese government has also announced they will soon be requiring facial scans at train stations as part of their growing dragnet.

Some Chinese citizens are even being required to install special software on their phones that tracks what they download. Footage from surveillance cameras, tweets, Facebook posts, and attempts to visit banned or otherwise “bad” websites can all affect your “social credit score,” which has been used to prevent “risky” citizens from doing all kinds of things, from buying airline tickets to getting your kids into private school.

The recent expansion of the surveillance state is not limited to historically authoritarian regimes. The United States has had its own share of frightening tools at its disposal, including the NSA’s PRISM program, famously exposed by whistleblower Edward Snowden. Under PRISM, the NSA has been collecting and storing massive amounts of data about internet traffic. Information found in the Snowden dumps and subsequent revelations have implicated the NSA in repeated (sometimes successful) attempts to break public encryption standards, such as the Diffie-Hellman key exchange. Doing so allows them to read huge swaths of information which was thought (and intended) to be private.

In the name of security, similar high tech surveillance and data collection policies are being rolled out all across Europe. Worldwide authoritarianism is on the rise, and the yearly Freedom on the Net reports express a commensurate rise in online censorship. From the 2017 report:

Online content manipulation contributed to a seventh consecutive year of overall decline in internet freedom, along with a rise in disruptions to mobile internet service and increases in physical and technical attacks on human rights defenders and independent media.
Nearly half of the 65 countries assessed in Freedom on the Net 2017 experienced declines during the coverage period, while just 13 made gains, most of them minor. Less than one-quarter of users reside in countries where the internet is designated Free, meaning there are no major obstacles to access, onerous restrictions on content, or serious violations of user rights in the form of unchecked surveillance or unjust repercussions for legitimate speech.

Governments around the world continue to become more tech savvy. In authoritarian regimes, this means better surveillance, more censorship, and powerful disinformation tactics. Some people argue that spy agencies exist to protect their citizens, and therefore these advancements are for the best. But even if we trust our own government, advancements in surveillance and cyber warfare are not without risk.

The extraordinary viruses Flame and Stuxnet were jointly developed by the U.S. and Israeli governments and share several components. Stuxnet is considered by some to be the most impressive piece of software ever created. It hijacked critical pieces of Microsoft infrastructure to spread itself to millions of computers worldwide, disguised as a legitimate Microsoft update. Both worms automatically copied and installed themselves onto USB drives and other peripherals. The Flame variant was able to turn on cameras and microphones, monitor web traffic, and much more.

These two viruses were unleashed worldwide with the stated goal of infecting and disabling Iranian nuclear facilities — at least that’s the story — but even if you trust your government to do the right thing with this software, having the code at all makes the government a target.

The Shadow Brokers are a hacker group who specialize in the sale of stolen hacks. Infamously, the Shadow Brokers stole critical technology from the NSA which would become the WannaCry ransomware worm that infected millions of computers in 2016. Perhaps the NSA never intended to use those technologies for evil, but does intent matter if they’re unable to prevent their code from being stolen?

As a society, we are only beginning to grapple with all the ways data can be weaponized. Yet we are in the middle of the big-data revolution.

As if this weren’t enough, spying isn’t just for governments anymore. Facebook, Google, and so many others are engaged in broad-daylight surveillance efforts that are intended to help them sell advertisements, make more “data-driven” decisions, and “understand their customers.” Selling ads is a pretty banal goal, but the information captured can also be used to more nefarious ends, e.g. the Cambridge Analytica scandal. What’s more, the data they collect is a high-value target for government agencies or non-state hacker groups who may be seeking to dox, blackmail, or steal the identity of people whose data they can intercept.

As a society, we are only beginning to grapple with all the ways data can be weaponized. Yet we are in the middle of the big-data revolution. Machine learning algorithms that process huge datasets are in the limelight in Silicon Valley and beyond. Plucking valuable jewels out of the noise of everyday internet traffic is easier every day.

Unfortunately, the worst vulnerabilities are largely unseen, and the worst abuses of our data have yet to come. Worse, some of the most concerning attack surfaces are built into the fabric of the internet itself, putting us all at risk.

The horrifying scope of the problem

Maybe you think you are safe.

You use a password manager. You randomly generate strong and unique passwords for every website. You’ve got tape over your webcam. You’ve disabled JavaScript. You block ads. You never login to unsecured WiFi access points. You encrypt. You use a prepaid burner phone. You use a VPN that uses Tor. And no, that room in your basement is not a “glorified tinfoil hat,” it’s a Faraday cage, thank you very much.

Unfortunately, even an extraordinary personal commitment to security is not enough to fully protect your data.

The truth is that modern web infrastructure has created a tangle of vulnerable systems. If your data passes through an insecure intermediary, you are now at risk — regardless of your own behavior. Your friends are storing the texts you sent them. Facebook stores pictures your friends took of you. Google stores your web search history and your location history, then sells that data to advertisers. Snap stores your Snaps. The list goes on.

Google provides users with options to turn this tracking off (deeply buried in their settings) but the only way to be sure your data is never on an insecure system is to never send your data using the internet.

Just kidding. Even if you’ve never used the internet, the modern world has used it on your behalf.

Equifax — a service that literally no one opts in to — exposed over 145 million social security numbers. The population of the United States is roughly 325 million, which means there is a 44% chance yours was stolen — in just this one attack. By the way, driver’s license numbers, dates of birth, credit card numbers, phone numbers, and tax identification numbers were also stolen in this data heist.

Anytime you engage in a financial transaction, there is a good chance it’s being entered into a database, and that database is likely connected to the internet. Governments are increasingly making information about citizens available online. Your friends, colleagues, and acquaintances also leave a trail of information online; tweets, Facebook posts, pictures, and messages about you — not even to you — can be used by organizations trying to track you.

Baratunde Thurston’s excellent and unnerving Data Detox in a Zillion Easy Steps captures a portion of the tangled web that makes up one’s digital footprint. Facebook shares data with third party apps, which developers can store in their own database. As we saw with Cambridge Analytica, that data can then be sold to a fourth party (and a fifth, and a sixth). That data might ultimately end up in the hands of a political action group, the Russian government, an advertising firm… anyone, really.

The tip of the iceberg

While we have precious little control over how our data is used by organizations, we have even less control over how our data is transmitted across the physical machinery of the internet.

Today, more than two quintillion bytes of data — that’s a two followed by 18 zeros — will be transmitted on the internet. Sometimes as radio waves. Sometimes as electrical signals. Sometimes as blasts of light, by way of laser or fiber optic cables. On the way to its destination, your data will pass through several different computers, any of which could be logging information about your IP address, the type of requests you make, the frequency with which your traffic passes through this computer, the size of the data you transmit and receive, and the list of IP addresses you send data to and receive it from. The Snowden dump revealed that the NSA is in fact logging this kind of information on a massive scale.

Information about which movies you watch, how often you Skype your mom, your favorite songs, and everything else you do on the internet is transmitted over potentially compromised infrastructure. Once it reaches its destination, your data is stored in other potentially compromised infrastructure. In many cases, ambitious third parties can snatch an alarming amount of this information out of thin air without being a major provider.

Devices on the Internet of Things will also be vulnerable to clever attacks that cause our devices to behave in unexpected and unanticipated ways.

If your connection is not encrypted, anyone on the datapath can readily log all the data you send and receive. Encryption goes a long way to protect the content of your messages. But unfortunately, encryption alone is not enough. Combining encryption with anonymity-focused tools such as VPNs and mix-nets can help mitigate these metadata leaks. One such tool, Tor, is a system that attempts to hide the source and destination IP addresses by using several proxies that all have limited knowledge themselves about the true source and destination. The technical details are vast, but maybe a metaphor is helpful:

Imagine you’re writing a letter to your friend Tim. You wish to disguise your location, so you enlist a network of individuals — Alice, Bob, and Charlie — who will forward letters sent to them, no questions asked (they are the Tor network). You put your letter in an envelope addressed to Tim, then you put that envelope in an envelope addressed to Charlie, then you put Charlie’s envelope in an envelope addressed to Bob, then you put Bob’s envelope in an envelope addressed to Alice. Then you put that nested set of envelopes in the mail.

Someone spying on your letters, but not spying on Alice, will see that you sent a letter to Alice. Someone spying on your friend Tim will know that Charlie sent Tim a letter. Someone spying on Bob wouldn’t even know you and Tim know each other. It’s not a perfect metaphor, but this is roughly how Tor helps protect your anonymity online: by obfuscating the true source and destination of internet traffic.

Tor and other similar systems are still vulnerable to something called correlation attacks, where an adversary who is watching your connection and Tim’s connection can use metadata to deduce that the two of you are communicating. This is harder and takes a sophisticated, powerful, and motivated adversary — still, it’s worth noting that nothing is going to give you perfect anonymity on the internet.

The expanding attack surface

Just as there are more ways to transmit data than ever before, more machines are talking and listening to each other. The Internet of Things continues to grow as watches, toaster ovens, refrigerators, fish tanks, thermostats, and more get connected to the internet. Tiny computers have pervaded nearly every aspect of our modern lives, and all of these devices are a new potential security risk — either as a device that creates data worth stealing, captures data worth stealing, or as a potential weak point to infiltrate your network.

Devices on the Internet of Things will also be vulnerable to clever attacks that cause our devices to behave in unexpected and unanticipated ways. For example, hackers have found ways to turn speakers into microphones and listen to you even on devices that are only supposed to make sound. In one particularly creepy study, scientists were able to listen to the sound of subjects typing their passwords into a keyboard and accurately guess those passwords 80% of the time. That was in 2005, by the way; the field of audio processing has advanced significantly since then. Imagine what else could be deduced from sounds you assumed were innocuous.

A fish tank thermometer was used to steal money from a casino. This wasn’t because the fish tank had direct access to financial transactions — instead, the fish tank was a weak point in an otherwise secure casino network. While the details of the hack were not made public, attackers likely exploited a combination of poor security practices (e.g. casino employees did not think the thermometer needed to be secured the same way other network devices needed to be) and poor security infrastructure (e.g. the thermometer manufacturers hardware/software was easier to crack than other devices on the network).

Hackers have been able to break into the computer systems of cars and remotely turn off the engine, slam the breaks, turn the wheels, accelerate, and more. Every single device with an internet connection is a potential attack vector.

Exacerbating the problem is that all of these devices, and the infrastructure connecting them, are powered by software that is often insecure. Take the Domain Name System (DNS). The Domain Name System is a crucial piece of computer infrastructure, and also a well-known privacy and security risk.

If you do not manually configure your computer’s DNS settings, you’re probably sending the list of every website you visit, when you visit them, and how frequently you visit them through public channels with zero encryption. By default this information will be sent to your ISP, who will then provide you with the IP addresses for the names you send such as “google.com”. All of this information could be stored indefinitely in an NSA operated data storage facility in Bluffdale, Utah (hypothetically speaking).

DNS’s weaknesses have also been exploited in high-tech data heists as well. Instead of logging DNS information, attackers use weaknesses in DNS and one of the major protocols powering internet traffic routing (the Border Gateway Protocol, or BGP) in order to steal login credentials in a phishing attack. In one such attack, over $150,000 in cryptocurrency was stolen from users of the trading website MyEtherWallet.

The attack was quite sophisticated. First, attackers maliciously rerouted internet traffic to compromised infrastructure using a well-known exploit called a BGP leak. Second, redirected traffic was corrupted — specifically, DNS queries related to MyEtherWallet.com were “poisoned” to send users to a phishing website. Third, when users unwittingly typed their passwords into the phishing website, attackers used the stolen credentials to steal their cryptocurrency.

BGP leaks, DNS poisoning, and bugs like the Heartbleed vulnerability found in the widely-used encryption package OpenSSL are especially concerning because they impact all internet users and offer affected users little-to-no recourse.

Data processing, politics, and prevention

As the scope of the available data has grown, so has our data-processing prowess. Big data is the new normal, and advances in machine learning have created even more thirst for massive datasets. The alarming amount of data that can be harvested from us is becoming more valuable with every step forward in artificial intelligence.

At the same time, we are slowly waking up to the idea that machine learning and data science is not objective. Datasets reflect the same biases of the conditions in which they are built. Algorithms and statistical models are built by people and companies with agendas, which are reflected in the functions of those algorithms and models.

There are plenty of examples of algorithmic prejudice. Google’s search algorithm came under scrutiny when people discovered that searching for “unprofessional hairstyles” disproportionately showed images of black women (who had perfectly professional hairstyles). Google removed the “gorilla” classification from some of its image recognition algorithms after it classified several pictures of black men as gorillas. ProPublica reported on a deeply troubling example of algorithmic bias, where programs used as part of courtroom procedures for setting bail, determining sentencing, and granting parole consistently over-penalized black people and under-penalized white people.

Coming to terms with our new interconnected reality will be a process, not a destination.

These results suggest that being “data driven” isn’t exactly the same as being “objective.” They call into question the applicability of current state-of-the-art machine learning algorithms, especially in areas already mired with racial prejudice such as law enforcement. Even so, the idea that algorithms could be a solution to government discrimination persists among the most devote technology acolytes.

There are also some encouraging signs of emerging skepticism. Google’s Project Maven and Amazon’s Rekognition AI tools both made headlines when the companies revealed negotiations to sell their products to the military. Employees at both companies staged protests, wrote letters to corporate leadership, and threatened to quit. The solidarity of technology workers, who are in high demand, worked. Rekognition was pulled by the Orlando police department. Google dropped out of Project Maven. The surveillance industrial complex — well established as it is — is not undefeatable.

Europe has been leading the way in digital regulation. The right to be forgotten has resulted in over 650,000 requests to delete data. The General Data Protection Regulation (GDPR) is hitting corporations in the pocketbook with Facebook and Google looking at fines in the billion dollar range.

Personal mitigation strategies, such as using a password manager, installing privacy tools like the EFF’s HTTPS Everywhere browser extension, and switching to a DNS provider like CloudFlare’s 1.1.1.1 which uses encrypted DNS protocols, can help. Unfortunately, personal actions will never be enough. Combating the combined strength of large organizations like the NSA, Google, and other surveillance giants will take organized effort.

As technology continues to invade the most private aspects of our life, we must continue to bring the more frightening aspects of this new reality into the light. We must demand that corporations and governments be more transparent about the data they collect. Coming to terms with our new interconnected reality will be a process, not a destination. The more we all learn about data, privacy, and surveillance, the more we can bring these problems into the mainstream consciousness, where they can actually be solved.