Understanding the Tor Network

Amanda_James_Min
Systems and Network Security
10 min readApr 18, 2020

By Amanda Pak, Choongmin Bae and James Mctaggart.

The Onion Router, Tor, is a free open-source software that can be used for anonymous Internet browsing. The Tor network masks a user’s identity by randomly moving traffic between multiple different Tor nodes or relays. Data enters the Tor network at the entry node/guard relay, travels between several middle nodes, and leaves the network through an exit node while being encrypted in layers at each node inside the Tor network. The entry node will know the IP address of the user but will not know what target server they are communicating with. Each middle node will only ever know the IP address of the previous node, and the next node, so they do not know any information about the user’s IP address or the target server. The exit node will know who the target server is but not the user’s original IP address. Data sent from the exit node to the target server and cannot be traced back to the user because the exit node appears to be the origin of the traffic.

But what does all this technical jargon mean and how does it work?

Well let’s simplify things and instead compare out internet traffic to a package sent in the mail. If I want to send a package to a friend, the simplest way to do that is to put my return address (IP address) on the package and ship it. Once my friend gets the package they can send it back because it’s not anonymous. It has my address right there on the package. This is roughly how regular Internet traffic works.

We could go one step further and we could ship our package to FedEx with instructions to forward the package onto our friend. Now our friend only knows that the package is coming from FedEx and not from us. The trouble with this approach is that FedEx knows that we sent a package and that our friend received it. This whole process is slightly more anonymous but we have to place our trust in FedEx. This is the Internet equivalent of using a VPN.

Now Tor takes things even further. To get our package delivered to our friend we would first ship our package to UPS and have UPS deliver it to FedEx. Then we would have FedEx deliver it to Canada Post and finally we would have Canada Post deliver the package to our friend. In this onerous chain of courier companies we managed to get our package delivered and only UPS is aware that we sent a package. Additionally only Canada Post knows that our friend received a package. Now if some malicious person is watching our mail they would only know that we are sending packages to UPS. This is roughly how the Tor network operates and handles private Internet communication. Everything gets transmitted from node to node (courier to courier) until the message reaches its destination. Additionally our return address/IP address is only known to that first node. After that, each subsequent node will only know the IP address of the previous node thus ensuring some anonymity.

Nodes and Relays

Now that we have a rough idea of what is going on in simple terms let’s dive back into the more technical aspects of what the Tor Network does. The main goal of using Tor would be to prevent an attacker from knowing where you were visiting on the Internet by mapping out the communication between you as a user and the servers you visited. Tor does this by creating “an overlay network in which each node maintains a Transport Layer Security (TLS) connection to every other node.” (Koch). When we want to send communications via Tor, the network will choose n nodes/couriers from the Tor system (typically 3 or more) and Tor will randomly build a circuit using these nodes. (Chaabane).

The packet is encrypted n times and it is sent to the entry node, before being sent to a relay node and so on until it hits the exit node and the packet is relayed to its final destination. Typically only three nodes are used on the Tor Network as using more relay nodes slows down data transfer speeds.

Figure 1.1 — A simplified Tor Network. (Arvindpdmn)

In Figure 1.1, we can see that Tor is generating a circuit of three nodes before reaching the requested destination. Further actions online will utilize circuits consisting of a different combination of nodes and Tor does this to help prevent profiling attacks. So to summarize a user will send their encrypted data to the first node. That node will read the outer layer of encryption and then send the data to the next relay node. This next node will read the next layer of encryption and then send the data onto the next node. Finally the exit node will remove the last layer of encryption and send the data to the destination.

Tor is a distributed network which means that it isn’t run by an one entity. Tor is made up of thousands of volunteers all over the globe who together are hosting relay nodes. Each node can be used to make up part of the circuit that a user will traverse to their destination. Each node just acts as a delivery courier and any one with a fast enough Internet connection can be involved. The distributed nature of Tor is what makes it such a robust and secure network but at the same time the variance in quality of each node can cause swings in performance depending on the relay nodes used. (Perry)

If you would like to be a part of the project it is fairly straightforward to host your very own relay node. You can get started at https://www.torproject.org/.

Encryption

To encrypt data, Tor provides RSA, DES, and RC4. Figure 1.1

Figure 1.2 A single onion layer. Adapted [reprinted] from Anonymous connections and onion routing, by Reed, Syverson, & Goldschlag,

The first leading 0 is required for RSA encryption. The important fields of the layer are Back F and Forw F. Back F is used to encrypt data transmitted back towards source whereas Forw F is the opposite direction. These function fields define which encryption to be applied: 0 for plaintext, 1 for DES Output FeedBack mode, and 2 for RC4.

RSA encryption is an asymmetric cryptographic system, meaning that it uses two different keys (private and public key) for encryption and decryption. The encryption process is as follows: A sender Alice uses a receiver Bob’s public key to encrypt the message, and Bob decrypts the cyphertext with his private. RSA implementation is as follows: First, it chooses two large prime number p and q. Then, calculate n=pq which will be a component of a user’s public key. Calculate the totient n=(p-1)(q-1), and select an integer e that meets the following conditions: 1<e<n, and lcde,n=1, which will be another component of the public key pair. Then find another integer d leading to the following equation: ed≡1(mod n). n and e are both public whereas p, q, and d are private. To encrypt the message m, calculate me mod n=c. Conversely, decryption is obtained from ce mod n=d.

DES encryption is a symmetric crypto schema. DES consists of three components: round, Feistel (F) function, and key scheduling. The graphical overviews of the components are as follows:

DES encryption process is as follows: First, it takes plaintext and performs Initial Permutation (IP). Input the latter half of the data into F function and operate exclusive-OR with the first half and the output of F function. This completes one round and repeat this operation for the rest 15 rounds. The final step is to revert the initial permutation which is what Final Permutation (FP) does. Key scheduling takes a 64-bit-long key and expands it to 16 subkeys that are 48 bits long for 16 operations of F function. F function takes a 32-bit-long input string and one 48-bit-long subkey. Then, it performs XOR operation, substitution (S1–8), and permutation (P).

RC4 is a data stream encryption algorithm, which is composed of Key Scheduling Algorithm (KSA) and Pseudo-Random Generation Algorithm (PRGA). Figure 5 illustrates the algorithm structure:

However, as many vulnerabilities have been found, this encryption algorithm is rarely used in modern cryptosystems.

Finally, Key Seed Material in Figure 1 is used to create three pairs of keys (key1, key2, and key3). These keys are used to encrypt messages so that it hides the source and destination identity. Especially, key2 is used for the backward encryption, and key3 is for the forward encryption.

Eavesdropping

There are several techniques that can be used to deanonymize Tor users, such as eavesdropping. Since Tor does not encrypt data, any traffic passing through the exit node to the target server is in a position to be intercepted, given that it is not encrypted through another method. The intercepted exit traffic may not be able to identify the user by their IP address since it has been anonymized after passing through multiple nodes, but it can expose other information about the source such as the site visited, usernames, passwords, and data. In addition, unencrypted traffic intercepted between the user end entry node would reveal the location/IP address of the user. If an attacker analyzes unencrypted traffic going into and coming out of the Tor network, they can match the exit node traffic with the entry node traffic to reveal both the identity and activity of a user. This is also known as traffic confirmation, or end-to-end confirmation, where the attacker simply has to confirm their hypothesis mathematically, by correlating the timing and volume of the information on either end of the network. By matching the time and amount of data that was sent to the entry node with the timing and amount of data received by the exit node, an attacker can connect the internet activity to a specific user with a relatively high probability. To prevent eavesdropping, Tor users should only visit HTTPS websites to ensure that the information leaving the exit node is encrypted and not visible to eavesdroppers.

Figure 6. Information visible to eavesdroppers without HTTPS encryption (Torproject)

VPNs vs Tor

A VPN encrypts the connection between a user’s computer, the VPN server, and the Internet, masking the user’s IP address by establishing a secure connection. VPNs have a performance advantage over Tor since a user is only connecting to one server rather than multiple Tor nodes. They also encrypt all network traffic coming from a user’s computer, whereas Tor only encrypts requests made using the Tor browser. While a VPN provides a high level of privacy by not revealing a user’s IP address to any websites that are visited, the VPN provider itself knows the user’s IP address so true anonymity cannot be achieved. Tor offers true anonymity by passing data between a number of randomly selected nodes, so no single node will ever know both the IP address and Internet history of a user. However, as mentioned previously it is possible for the entry and exit nodes to be hacked to de-anonymize the user since Tor does not encrypt data. Both Tor and VPNs can be used to provide a layer of anonymity and security, but each service has its own advantages based on cost, performance, and security.

Final Thoughts

In the modern age where everything you do online is being monitored either by governments or by advertising tech companies like Google or Facebook, Tor represents a real and valuable tool to fight back and regain some shred of anonymity online. The distributed nature of the Tor Network and its global team of volunteers are providing us with a small slice of privacy. By working together to relay the heavily encrypted data packets, regular people can become anonymous online once again. But just like in real life Tor can obfuscate our movements best when it is hiding in a large crowd. The more people that use Tor and the more people the host relay nodes the better network becomes. So consider tying the Tor Network and maybe even host a relay node.

Citations

  1. Arvindpdmn. (2020, January 6). Tor Network. Retrieved from https://devopedia.org/tor-network
  2. Chaabane, A., Manils, P., & Kaafar, M. A. (2010). Digging into Anonymous Traffic: A Deep Analysis of the Tor Anonymizing Network. 2010 Fourth International Conference on Network and System Security. doi: 10.1109/nss.2010.47
  3. Koch, R., Golling, M., & Rodosek, G. D. (2016). How Anonymous Is the Tor Network? A Long-Term Black-Box Investigation. Computer, 49(3), 42–49. doi: 10.1109/mc.2016.73
  4. Mccoy, D., Bauer, K., Grunwald, D., Kohno, T., & Sicker, D. (n.d.). Shining Light in Dark Places: Understanding the Tor Network. Privacy Enhancing Technologies Lecture Notes in Computer Science, 63–76. doi: 10.1007/978–3–540–70630–4_
  5. Mohammed, E. A., Areed, N. F., Takieldeen, A., & El-Awady, R. M. (2016). Hybrid Cryptographic Algorithm for LTE DataConfidentiality. International Journal of Engineering Research & Technology (IJERT), 5(12).
  6. Reed, M. G., Syverson, P. F., & Goldschlag, D. M. (1998). Anonymous connections and onion routing. IEEE Journal on Selected areas in Communications, 16(4), 482–494.
  7. Perry, M. (n.d.). TorFlow: Tor Network Analysis. Retrieved from https://fscked.org/talks/TorFlow-HotPETS-final.pdf
  8. Wikipedia contributors. (2020, March 15). Data Encryption Standard. In Wikipedia, The Free Encyclopedia. Retrieved 01:25, April 1, 2020, from https://en.wikipedia.org/w/index.php?title=Data_Encryption_Standard&oldid=945661323
  9. Ramadhani, E. (2018). Anonymity communication VPN and Tor: a comparative study. Journal of Physics: Conference Series, 983, 012060. doi: 10.1088/1742–6596/983/1/012060

--

--