Peeling the Onion: Demystify How Tor Enables Anonymous Communications

Honghui wang

Published in

Systems and Network Security

15 min readApr 4, 2020

By Honghui Wang, Sijie Yu, Yufei Zhang.

A Brief Introduction

Nowadays, online behavior tracking, supervising and censorship have been more and more annoying and sometimes even terrifying due to the booming of deep learning and big data. Not only once, I heard my friends complaining about how their computers can see through them, knowing what is in their minds, or getting scam phones that the callers seem to know everything about them. To be fully anonymous, sometimes is not only convenient but also necessary, especially in some particular regions or for some agencies.

The idea of communicating through multiple randomly selected middle nodes in order to hide the sender’s identity was proposed by the United States Naval Research Laboratory in 1995. They named this method onion routing and it was used for protecting U.S. intelligence communications online [3]. But if only the intelligence agency uses this particular method, their behavior will be too obvious, so they open it to the public.

Nowadays, the Tor project has become a free and open-source software managed by the community. And the nodes in its network are all set up by volunteers. Tor is more than just software. “It is a labor of love produced by an international community of people devoted to human rights” [4].

The Tor Browser

It is pretty easy to use Tor. The Tor Project offers us the one-stop solution — the Tor Browser, also known as the Tor Browser Bundle. It consists of many parts: Tor Browser, Tor Button, Tor launcher, HTTPS everywhere, NoScript, Onion Proxy and lots of other addons. The browser is actually a specific version of Mozilla Firefox that has been changed so that it is extremely safe and secure [5]. The Onion Proxy is the most important component that enables the use of the Tor network.

Let’s try it out

Here you can download the TOR browser. Then install and open it. The first time you open it, you can do some configuration and click connect to start. The browser will fetch the whole information of the Tor network, then open the default page for you shown below.

It is said that it is better not to expand the browser to the full window size as this behavior can leak the information of your devices. This kind of mindset is interesting, ha? From the default page, the Tor browser has prepared you the search engine called DuckDuckGo. You can use it to search for the hidden servers that you can not visit without the Tor network. It is like Google in the darknet. Those hidden server URLs always look like this: xxxxxxxxxxxxxxxx.onion, 16 characters followed by .onion. Take the Hidden Wiki for example: http://hiddenwiki7wiyzr.onion/. This link is not openable in a normal browser but available in Tor Browser. And when I go to https://whatismyipaddress.com/, I can see that my IP comes from the Republic of Moldova, which is exciting!

How Tor really works

The Tor network is an overlay network [1], meaning that it is a network built on top of the public Internet. This network consists of many onion routers that are provided by volunteers. Each onion router (OR) maintains an encrypted connection (TLS) to every other OR. Among those ORs, several biggest ones are preselected as Director nodes. Those director nodes keep monitoring the whole Tor network and maintain consistency between them using some mechanisms. When the user opens the Tor browser, the onion proxy (OP) in it will first download the whole network to local and randomly select three of the ORs to create a three-hops path through which the user will surf the Internet.

I believe a review of the used encryption algorithms can let readers understand the principle of Tor deeper and clearer. If you already know that knowledge or you just trust Tor and have confidence that it is secure, feel free to pass the two sections below.

Knowledge Background: Symmetric encryption

Symmetric encryption is that encryption and decryption all use the same key. Tor uses the asymmetric encryption algorithm called AES to encrypt the information sent between its nodes because it is quick.

The problem of symmetric encryption is that, for two parties to communicate, one party has to send the key to the other, so that they share the same key. The moment one party sends out the key, the eavesdropper can get the key, making the encryption useless. So, we need a way to let the two parties share the same key without sending the key.

Knowledge Background: Diffie-Hellman key exchange

Thanks to Diffie and Hellman, we can let two parties share the same key without sending it.

Merkle, left, Hellman, middle, and Diffie in 1977 [2]

Hellman and Diffie won the 2016 Turing Award for their critical contributions to modern cryptography. In fact, Merkle, the man on the left in the picture, is the first man doing research on this topic. Why only Hellman and Diffie won the Turing Award? That’s another story.

The Diffie-Hellman key exchange is based on this function shown below:

Party B knows g, p, b, gᵃ, and Party A knows g, p, a, gᵇ.

Using the function above, they can both generate the same value:

This value is the key both parties share. Let’s see the details of this algorithm.

As shown above, party A first generates two big co-prime number g,p, then sends them to party B. Then party A generates its private number a, and party B generates its private number b. They never share these two numbers to anyone. Then party A and party B calculate and exchange k_a and k_b respectively. In the end, both parties can calculate the same value as their AES key. We can see that even the eavesdropper knows g, p, k_a, k_b, there is no way for him or her to figure out the value of the key because he or she doesn’t know a or b. And as g and p are really big, using g, p, k_a or g, p, k_b to calculate a or b is impossible given today’s computational ability.

Knowledge Background: circuit and cell

Tor calls the connection between Tor nodes and client circuits. A circuit is an encrypted communication channel based on TLS. It conceals the data, and prevents the attacker from seeing or modifying the data, or impersonating a Tor node.

The basic units that are transferred to those circuits are called cells. Tor fixes the size of them to be 512 bytes each, so that every communication unit looks almost the same, imposing a barrier for the eavesdropper to figure out what is really going on. If the cell is not big enough, Tor will pad them. For those exceeding this size, Tor will divide them into several cells. Those cells have two types. One is called a control cell, and the other is called a relay cell. The control cell is used to create, extend, truncate or destroy the circuits between the nodes. The relay cell is used for passing the user data.

Knowledge Background: Director Servers

Among those Tor nodes in the Tor network, some well-known and powerful nodes are preselected as Directory Servers. They are in charge of tracking changes in network topology and node state. Whenever a node changes, creates or closes, it will report to one of them. And by some consensus algorithm, all the servers will maintain an up-to-date Tor network view. The list of those servers is preloaded to the client software to bootstrap each client’s view of the Tor network. The list contains the public key and IP of each server so that attackers can not impersonate them. From here you can find the list of all the Director Servers.

First Phase: Create the path using the Tor nodes in the Tor network

When the user wants to use the Internet through the Tor network, the first step is to create a path based on the nodes in the Tor network so that the exit node can surf the Internet on behalf of the user while in the meantime without knowing who the user is.

At first, the onion proxy (OP) in the user’s software will fetch the information of the whole Tor network from the Directory Servers. Then, it will randomly choose three nodes to form the path.

As shown above, the client negotiates with the first node using Diffie and Hellman key exchange algorithm. Together they create a red AES key, and they create a TLS encrypted circuit between them, which we denote here as CircID1.

Then, through the first node, the client and the second node create the blue key together. And the first node creates a CircID2 circuit with the second node, and it associates CircID1 with CircID2.

The same, through the first node and the second node, the client and the third node create the green key together. And again, the second node creates a CircID3 circuit with the third node and associates CircID2 with CircID3.

Now, the path composed of CircID1 circuit, CircID2 circuit, and CircID3 circuit is created and the client has all the sharing keys that every node in the path holds.

Second Phase: Relay the request to the server

After the client creates the path, it now can use this path to surf the Internet.

Below are the images showing how the client makes a request to a server.

First, the client will divide and pad its request into several chunks. After that, it encrypts the chunks using the green AES key, then the blue one, then the red one. Then it will construct relay cells that start with CircID1. Every chunk will have its dedicated cell. Then, the client sends those three-layers- encrypted cells one by one to the first node. When the first node receives one of these cells, it will check the CircID and find the corresponding red key to decrypt the cell. After that, it gets a two-layers-encrypted cell. We can see that now the first node only knows who sends this cell, and what the next node is, but it has no idea what the payload of the cell is or what the destination is. As in the creating phase, the first node has associated CircID1 with CircID2, the first node then changes the circID of the cell from CircID1 to CircID2 and sends it to the second node through CircID2 circuit.

The second node then receives the cell through the CircID2 circuit. It knows to use the blue key to decrypt the cell, change the CircID2 in the cell to CircID3, and send it to the third node through CircID3 circuit. Now it gets a one-layer- encrypted cell (in fact it has no idea how many encrypt layers the cell has as it is still encrypted), having no idea what is in the cell, what the source of this cell is and what the destination of this cell is.

In the end, the third node receives the cell through the CircID3 circuit. It knows to use the green key to decrypt the cell. As the cell comes from CircID3, the third node knows itself as the final hop of the path, and the decrypted cell is part of a normal packet. It will wait for the remain cells to come, and combine them together to form the normal packet and sends it to the server as required by this packet. This exit node knows the content of the cell, the destination, but has no idea who the original sender is.

Third Phase: Relay the response from the server back to the client

The nodes on the path relay the response from the server back to the client in a similar way as they relay the request.

When the third node receives a response from the server, it will divide and pad the response into several chunks, and encrypts them using its green key. For every chunk, the third node adds CircID3 in front of the chunk to form a cell. Then the third node will send the cell back to the second node.

The second node will encrypt the cells received using its blue key, change the CircID of the cells from CircID3 to CircID2, and send them through CircID2 circuit.

The same, the first node will encrypt the payload of the cells received using its red key, change the CircID of the cells from CircID2 to CircID1, and send them back to the Client through CircID1 circuit.

When the client receives those cells. It will decrypt them using the red key, the blue key, and the green key. Then combines the chunks to form the real packet — the response from the server!

Hidden servers in the Tor network

Not only Tor can protect the identities of clients, but also it can protect the identities of servers. Those protected servers are called hidden servers, and no one can know the IP address of them, but at the same time can visit them through the Tor network. I believe the first time most people heard about Tor is because of the taking down of the notorious darknet black market Silk Road. It is a typical Tor hidden service just as the one we tried whose URL ends with .onion.

So how exactly does Tor host those hidden servers in the Tor network?

The idea is to pick some middlemen to do the communication on behalf of both the client and the server.

Firstly, the server will select one (or several) Tor node to be his introduce node through a three-hops Tor path and advertise this introduce node along with his onion URL in the distributed hash table (DHT) in the Tor network.

The client, wanting to visit the service, will first search the introduce node in the DHT using the service’s URL obtained externally. Then the client selects a Tor node as the rendezvous node for them and the server also through a three-hops Tor path. At the same time, the client will send the address of the rendezvous node to the introduce node through a three-hops Tor path.

Then from the introduce node, the server will know the rendezvous node. Then the server will meet with the client through a three-hops Tor path.

We can see that, in the end, the number of nodes between the client and the server is 5. And except the client and the server, no one knows what is going on.

This is how hidden service works. It is very secure but inevitably slow.

Becoming part of the Tor network

The Tor network is built up by volunteers all around the world. If you like its idea, you may want to make some contribution to it. Besides donating to the Tor project, another straight way is to become part of the Tor network.

Hardware Requirement

There are three kinds of Tor Node: guard node, middle node, and exit node.

The Guard node is the first node that client talks with. The Middle node is the node in the middle of the path. The Exit node is the last node in the path.

The machine requirement of middle and guard nodes is low as its work is relatively low, any normal server with a public IP address and a stable network is OK.

To set up an exit node is complicated. In order to counter the Tor abuse problem, you need to monitor your server and configure your node carefully such as only opening 80 port. And you need to make sure your ISP allows you to set up a Tor exit node as the abuse of exit nodes can incur judicial problems. Here are some templates Tor offers for dealing with the abuse complaints from other companies.

Set up a Tor Node

Take the Ubuntu server for example. First, you need to install the server-side tor program.

apt update && apt install tor

Then go to Tor’s config file and edit it to config based on your need.

vim /etc/tor/torrc

You can find more set up details here.

After setting up your Tor node, you can go to this URL and use the IP address to search for the running state of your Tor node.

Availability of Tor

Sometimes, Tor is abused by bad people to do bad things. So in some regions, it is banned. And as all the nodes information can be fetched from the Directory Servers, it is pretty easy to ban the Tor network — just ban all the nodes listed! But the Tor project has a solution to circumvent this.

Bridge node

Bridges are Tor nodes that help you circumvent censorship [7]. Bridges are private servers that are offered to the users as a stepstone to visit the Tor network. Users can get bridges through this link. But the reality is that those authorities also are banning bridges too. So the Tor project has to encourage people to add new bridges constantly to counter the banning.

Also, the other way to circumvent censorship is to use VPN. Combined VPN with Tor will further boost your network’s security, anonymity, and availability.

No guarantee of 100% anonymity

Tor is not 100% anonymous guaranteed. You need to be very careful not to leak your personal information when using it.

For example, in 2013, a Harvard student tried to send bomb threats to the university to postpone his exam but in the end, got identified even though he used Tor and Guerrilla Mail (a Disposable Temporary Email Service) [6]. By finding who is using Tor in Harvard at the time the email is sent, the FBI identified the student.

And when a big organization has many Tor nodes in the Tor network, it can control the guard node and the exit node at the same time. But compared the input frequency of the guard node with the output frequency of the exit node, they can associate the user and the server he or she visits.

What’s more, if you visit a website based on HTTP proxy instead of HTTPS proxy, the eavesdropper at the exit node can always get all the information such as the account name and password you entered. I find a very great webpage showing the benefits you can get by using HTTPS and Tor together. Below is a snapshot of it. As you can see, by turning on HTTPS and Tor at the same time, the eavesdroppers can gain very little information about the user.

From https://www.eff.org/pages/tor-and-https

Here Wikipedia has an elaborate list of weaknesses Tor has. If you want to conduct something that you do not want others to know using Tor, you’d better read through all those caveats.

The other problems of Tor

Exit Node Abuse

The abuse of exit nodes is a big problem with Tor. It not only costs the volunteers being running exit nodes a lot but also implicates them with legal issues. This makes people not willing to make contributions to Tor and in turn Tor becomes less secure because of fewer nodes. Even the authors of Onion Routing admitted that “preventing abuse of open exit nodes is an unsolved problem, and will probably remain an arms race for the foreseeable future” [1].

The performance issue of Director Servers

Now the Director Servers just store all the information of the Tor network. When the scale of the Tor network becomes even bigger, those Director Servers will become the bottleneck of this system.

Conclusion

We introduce almost every aspect of Tor, especially the core mechanism of how it hides the users’ information and how it enables hidden services. This is a great tool to protect user privacy and to circumvent censorship as well as supervising. But we can also see, you should never exploit it to do illegal things. When law-enforcing agencies spend time doing analysis, even Tor has various vulnerabilities to be taken advantage of to identify you.