Do VPNs Actually Protect Your Privacy?

It depends on who’s snooping around…

Virtual Private Networks (VPNs) are one of the tools most mentioned in the context of modern privacy on the web. While there are many uses for VPNs, there are a growing number of providers who advertise privacy preserving VPNs. For example, NordVPN’s website advertises, “Enjoy secure and private access to the Internet with NordVPN, encrypt your online activity to protect your private data from hackers or snoopy advertisers.”

How exactly does a VPN protect you from these “snoopy” entities?

When you use a virtual private network, in essence you are using a proxy to make requests on your behalf. With a “privacy focused” VPN, your connection to the proxy is encrypted which makes it impossible for someone snooping on your traffic to determine what kind of messages you’re exchanging with the VPN server. Typically, the VPN server’s role is to make web requests on your behalf, instead of having you make them directly.

Let’s say I want to visit MyDirtySecret.com, but I don’t want my ISP to know that I’m visiting that website; it’s a dirty secret after all. Normally, I have to make a request via my ISP that says, “I’m visiting MyDirtySecret.com.” or at least the IP address for that website. My ISP can easily log this information and, for example, sell it to advertisers later.

A VPN allows you to send an encrypted message to the VPN provider that says, “can you fetch the website data for MyDirtySecret.com, and send it to me please?” Your ISP knows that you sent a message to your VPN, but because of the encryption they cannot read the content of your message. The VPN’s ISP could see that the VPN made a request for MyDirtySecret.com, but couldn’t necessarily correlate the VPN’s request for that website, with your request to the VPN. Consider this diagram:

An internet request/response cycle with and without a VPN

Notice that in both of these cases requests spend time traveling through public web infrastructure — many people have expressed concern (and given evidence) that some of the major pieces of infrastructure include government run tracking and logging tools, that store massive amounts of traffic data and metadata. This metadata, by necessity, includes your IP address. In the first example, no matter where those logging nodes are within the internet they can see that you’re sending data to MyDirtySecret, or that MyDirtySecret is sending data to you.

In the second example though, nodes that see your traffic before it reaches the VPN (e.g. your ISP) can only deduce that you’re sending messages to a VPN. Nodes that see your traffic after it reaches the VPN can only deduce that the VPN is communicating with MyDirtySecret; the data transmitted between the VPN and MyDirtySecret.com only has the addresses of the VPN and a MyDirtySecret.com server. The assertion is that because of this your privacy has been protected — your IP address does not appear in any packets side by side with MyDirtySecret’s IP address, and therefore your secret is safe.

Or is it?

Threat Modeling: From Whom Am I Safe?

One of the most important concepts in software security is threat modeling. Threat modeling is the process of asking questions like, “Who do I need to protect myself from? What exactly am I protecting? How might my adversaries circumvent my protections?” Threat modeling is crucial, because the truth is that nothing will ever protect you from everything. Mitigating the most likely threats from the most likely actors is often the best you can do; like so many other things, cybersecurity is a world of tradeoffs and prioritization; there are no silver bullets.

Let’s look at two very different threat models.

Scenario A: I am engaged in seditious activity, and I am afraid the NSA will find out what I’m up to, or identify the people I am communicating with.

Scenario B: I don’t want my ISP to know my browsing history because sometimes I watch embarrassing videos, and I don’t trust my ISP to keep that information secret.

In scenario A, my adversary is one of the most technologically sophisticated and powerful organizations on the planet. They have access to incredible resources, and are highly motivated to catch me because their job is (ostensibly) to catch seditious folks like me. I can expect them to use a variety of tools, react to information they learn about me, and track down leads. They can wiretap my home, have passive monitoring on major internet nodes, and can wiretap other entities if they learn that I am communicating with them.

In scenario B, my adversary is still economically and technologically sophisticated, but they are significantly less motivated to follow up on leads, and react to my changing behavior. Perhaps they’d prefer to know my browsing history so that they can sell it to advertisers, but they also take my money directly and have a lot of other business opportunities to chase down. Collecting my data is just “low hanging fruit” for my ISP, and if I move that fruit higher up the tree, they’ll move on to a shorter tree.

I contend that a VPN will help you with scenario B, but it might actually hurt you in scenario A. A friend of mine who is a software security engineer for Google put it this way, “The NSA wants to monitor traffic that has a high ratio of signal to noise,” where the signal here is data transfer related to illegal activity that the NSA cares about. My friend continued, “If the NSA doesn’t have monitoring set up on ALL the VPN’s that advertise themselves as privacy preserving, frankly I want my tax dollars back.”

Tapping the VPN’s inbound and outbound traffic for a correlation attack.

It’s hard to imagine a better honeypot for criminal activity than a service that advertises it can keep your identity safe. Using a collection of statistical tools and wiretaps, powerful agents like the NSA can execute a class of attack called a “correlation attack”. In this kind of attack, the NSA would process the metadata of traffic going in and out of the VPN, and use information such as the size and timing of that traffic to deduce which outbound VPN requests were associated with which inbound requests to the VPN — gluing the IP address of interest to the IP addresses being contacted by the VPN.

In other words, the NSA notices that I requested something from the VPN, and right after that the VPN requested something from MyDirtySecret.com. Hmmmm, they think — I bet Tyler is making that request. These attacks rely on statistics, probability, traffic patterns, and can be improved with knowledge of the VPNs behavior or the target’s behavior.

VPNs, mixnets, and Tor are all potentially susceptible to these kinds of attacks. Previously, these attacks were much more difficult to enact against Tor at scale, but recent machine learning approaches to traffic correlation have blown the lid off of previous state-of-the-art correlation attack methods. I expect anonymity researchers to take queues from research into using Generative Adversarial Networks (GANs) to generate high quality fake video. GANs could also be used to create the next generation of anonymizing tools that are able to confuse the latest correlation attack software — fighting fire with fire in the endless arms race that is cybersecurity.

Luckily, most of us are not the target of active investigation by the NSA. As a result, for a number of threat models, a VPN might be a great tool to help you avoid snooping and spying by bad actors like an ISP, or your creepy neighbor. On the other hand, it’s always worth spending some time threat modeling to figure out what you’re trying to keep private, and from whom it must be kept private. VPNs aren’t a panacea, and using one might actually make you an easier target depending on your threat model.