Is Tor Really Anonymous?

SFU Cybersecurity
Systems and Network Security
11 min readMar 7, 2022

This blog is written and maintained by Dickson Lee and Jiawen Zhang in the School of Computing Science at Simon Fraser University.

Ref

Tor is one of the largest anonymous networks intended to protect network participants from unauthorized private information leakage. However, does Tor really work as desired? Can someone compromise Tor users’ privacy?

Tor Project

To help you better understand the potential threats on Tor, we are going to introduce the Tor project briefly.

What Is Tor?

The Tor Project is a non-profit organization that conducts research and development on privacy and anonymity network. It is designed to stop organizations — including government agencies and corporations — from learning your sensitive data, likes locations, or tracking your browsing habits [8]. The Tor Project became a 501(c)3 non-profit in 2006, but the idea of “onion routing” was developed in the mid-1990s at the U.S. Naval Research Laboratory by Paul Syverson, Michael G. Reed, and David Goldschlag to protect U.S. intelligence communications online. It was further developed by the Defense Advanced Research Projects Agency (DARPA) and patented by the Navy in 1998 [14].

Why We Need Tor?

Nowadays, it is hard to preserve the privacy and anonymity of web activities. Even under encryption (TLS), TCP/IP packets leak metadata such as client IP, destination IP, time/size of packets and the number of connections. The collection of data could threaten users’ privacy; trackers can assemble this kind of data into a recognizable identifier, which could be used to trace your routing and track your behaviours on the Internet. An eavesdropper (ISPs, Government agencies and Hackers) could find out 1) who the client is, and 2) what server they are accessing. Which website you visit can reveal your purchasing behaviour, political preferences, personal beliefs, Internet usage and any illegal activities. Supposing that, by collecting your location information, an e-commerce site can apply price discrimination; the location information can even threaten your physical safety.

How Does Tor Work?

Tor is a successful privacy-enhancing mechanism that works at the transport layer. Normally, a TCP connection you make on the Internet automatically reveals your metadata; Tor allows you to make a TCP connection without revealing your routing. Each time when you are accessing a remote server or browsing a website, Tor protects your identity online by routing traffic through multiple randomly chosen relays before arriving at the destination. Figure 1 demonstrates the route of an Internet request through the Tor network. The Tor services heavily rely on the Tor network.

Tor routes traffic through multiple relay servers before accessing the destination website. The request is encrypted multiple times, so the relays only know the previous node and the next node.
Fig. 1. Tor routes traffic through multiple relay servers before accessing the destination website. The request is encrypted multiple times, so the relays only know the previous node and the next node. Ref

Tor Network

The most important component of the Tor project is the Tor network, which is an implementation of the onion routing; onion routing is a distributed network designed to anonymize TCP connections. The Tor network is a group of volunteer-operated servers that provides people with the ability to improve their privacy and security on the Internet [6]; there are around 7,000 IPv4 relays and 2,000 IPv6 relays available online at the time of writing [5]. A Tor client joins the network by building a Tor circuit, which is a path through the Tor network consisting of randomly selected nodes; this allows the users to transmit data over public networks without compromising their privacy. The requests are encrypted multiple times, so the relay servers only know its predecessor and successor, but not the request source IP (except the entry guard) or the destination IP (except the exit relay), much less the full circuit [12].

Website Fingerprinting Attacks

Ref

Tor users’ privacy is being violated even they use end-to-end encryption with a bug-free version Tor browser. A bunch of research studies show that Internet participants’ privacy could be compromised via Network Traffic Analysis (NTA) even with Tor enabled. NTA is the process of intercepting, recording and analyzing network traffic communication to compromise users’ security and privacy [13]; Website Fingerprinting (WF) attack is such kind of attack. The goal of the WF attack is to identify which webpage the user is browsing along with a set of n pages of encrypted and anonymized connections. Tor is susceptible to WF attacks, which allow a local, passive adversary to identify users’ web activities according to patterns of their packet sequences [10].

Fig. 2. The threat model of a website fingerprinting attack (the adversary sits in between the client and the Tor network). The attacker first visits a set of websites to collect training data; later, he observes the victim’s traffic trace and tries to find a match in the training dataset using the trained model. Ref

WF attacks utilize the fact that web content associated with each webpage is different, which can be used to match the traffic pattern, in turn, to predict the webpage. The WF attack is regularly interpreted as a classification problem; to conduct such attacks, the adversary first visits some monitored websites, records the packets of data flows; then traffic features such as packet time and direction are extracted from a set of collected traffic traces; a machine learning model is trained using these features and further used to predict web activities. Later on, the WF attacks take place between the client and the Tor entry guard or solely sits in the entry node (the first anonymization relay server). When a new Tor user connects to the Tor network, the adversary (eavesdropper) observes the victim’s traffic, captures the packets and parses them to Tor cells. The adversary feeds the feature-rich data into the trained model to predict which webpage the victim is browsing. Figure 2 illustrates the threat model of the WF attack. The deployment of the WF attacks requires bare resources since there are many entities between the user and the Tor entry node such as local network admins, ISPs and even the entry guard itself; deploying an evil Tor node is feasible since Tor nodes are operated by volunteer participants.

There are plenty of WF attacks that have been proposed in the last twenty years, leveraging some sort of machine learning and deep learning techniques such as SVM, k-NN and Deep Neural Network (DNN). The most recent well-known WF attack with a high success rate is Deep Fingerprinting (DF), which was proposed by Sirinam et al. in 2018 [7]; the subsequent studies have further improved the model to achieve a higher classifying accuracy but without a major breakthrough. DF leverages the Convolutional Neural Network (CNN) with a sophisticated model design. DF is recognized as an effective attack against Tor that achieves 98.3% classifying accuracy on Tor without defenses; it is even effective against Tor with some little delay, light weight WF defenses enabled. It beats WTF-PAD [4] with over 90% accuracy and defeats Walkie-talkie [11] with 49.7% accuracy, which outperforms its peers.

Although, the research of WF defenses is also an active study in the past few years; a lot of defenses successfully defeat against the WF attacks; none of them has been accepted by the Tor project team for further development since most of the defenses have ridiculous bandwidth and latency overhead, which aggravates the loading time of webpage via Tor browser (the loading is already notably slow due to the design of Tor). As a result, the WF attacks are effective and efficient in the wild.

Indirect Rate Reduction Attack

The Tor network is commonly used by clients to anonymize their network traffic. For instance, citizens of certain countries or employees of some companies may want to establish network connections to remote servers via the Tor network, so that the countries’ authority or their employers would not be able to track whom they are talking to even the client network is under monitoring. However, some researches [2] have shown that there are ways to monitor communications via the Tor network.

Indirect Rate Reduction Attack, introduced by Gilad and Herzberg in 2012 [3], is one of the timing attacks for de-anonymizing the communication of clients and a specific server through the Tor network. In the scenario of an indirect rate reduction attack, the adversary, Eve, has a targeted client (or a group of targeted clients) and a predefined server, and Eve’s objective is to identify if the targeted client(s) has established anonymized communication with the predefined server through the Tor network. Eve is assumed to be able to eavesdrop on the client machine, but not on the server machine.

As mentioned earlier in the previous section, there are around 9,000 relays up and running at the time of writing. However, only 1,150 out of 9,000 relays can be performed as exit relays. In addition, Tor clients pick an exit relay that will handle the most pending requests instead of a truly random manner [1], which means that the clients are likely to select an exit relay from a relatively small number of fast relays. As a result, Eve could reduce the targeted exit relays into a manageable number.

In order to achieve a higher chance on discovering traffic between the targeted client and server, Eve would first use certain techniques to check whether there are connections between targeted exit relays and the target server, i.e. any anonymized connection to the server via the exit relays. How the aforementioned techniques are done is not Tor specific, therefore it will not be discussed in this article.

Fig. 3. The adversary observes the transfer rate between the targeted client and the entry node after slowing down the traffic between the exit relay and destination server.

After identifying the IP address and port number of exit relays communicating with the targeted server, Eve sends spoofed packets to the exit relays using the targeted server’s IP address. After receiving the spoofed packets, the exit relays would send duplicate acknowledgement packets to the targeted server and triggered the TCP congestion control mechanism, which would in turn slows down the communication in between. If the targeted client has been communicating with the targeted server through that exit relay, Eve will be able to observe a similar rate reduction in the communication between the targeted client and the entry node. Eve would then repeat this step a number of times to obtain a more accurate result.

Raptor Attack

Tor network traffic is encrypted and anonymized inside a Tor circuit, but exposed in the communication links between client and entry node, and between destination server and exit relay. It would be easy for an adversary to de-anonymize Tor communications using traffic correlation attacks if the attacker is able to observe both of the traffic before entering an entry node and after an exit relay.

The Raptor Attack [9], introduced by Sun et al., assumes a powerful adversary who can use autonomous systems (AS), e.g., network service providers, state governments, or adversaries who have the ability to hijack autonomous systems. Raptor attacks are composed of three individual tasks: asymmetric traffic analysis, natural churn in Internet routing, and BGP interception attacks.

Fig. 4. Asymmetric traffic analysis. Blue lines indicate traffic from the client to the server and red lines indicate traffic from the server to the client

Internet routing paths are always asymmetric, which means that the routing path from A to B may be different than the path from B to A. In Fig. 4, the network packets sent from the client to the entry node transferred via AS1, and the traffic goes through AS2 from entry to the client. Conventional traffic analysis attacks require a pair of traffic which are in the same direction, e.g., client to entry and exit to server. By retrieving the sequence numbers and acknowledgement numbers in TCP headers, asymmetric traffic analysis is capable to de-anonymize Tor communication even the pair of traffic are in different directions. For instance, AS2 can compromise the anonymity of the Tor traffic in Fig. 4.

Fig. 5. Changes of Internet routing path

Internet routing paths change over time when there is any addition or deletion of routers or links and any change in routing policies. The natural churn in Internet routing increases the chance for an adversary to observe both entering and exiting traffic of a Tor circuit. For instance, no AS could compromise the original Tor traffic in Fig. 5. But after a natural change of routing path from AS4 to AS2, AS2 can now de-anonymize the Tor traffic. These routing changes would increase the number of ASes that can compromise the anonymity of Tor traffic.

Do the adversaries have to wait for a natural Internet routing change that flavours the attack? The answer is No. An adversary could perform a targeted BGP interception attack to divert the specific Tor traffic passing through ASes under his monitoring, and would get a 95% accuracy rate to de-anonymize the Tor traffic for merely 5 minutes of asymmetric traffic analysis [9]. By comprising various techniques, the Raptor attacks present a serious threat to the security of the Tor network.

Conclusion

The increasing acceptance of Tor by individual users and organizations has led to the growing investigation of de-anonymizing attacks on Tor; the sophistication of the Tor network provides a wide attacking surface which makes the situation worse. Along with the increasing number of effective and advanced attacks on Tor, there are a lot of defenses against different types of attacks that have come to the fore; however, as we discussed in one of the previous sections, the Tor project accepts few of these suggestions since most of them have not been proved in the real world; switching to a new infrastructure should be very cautious due to the cost of development and maintenance. Tor is not 100% anonymous even under correct configuration; besides the selected attacks in this post, various attacks have been proven effective against Tor; the adversary can leverage such attacks to compromise Tor clients’ privacy. The increasing number of mighty attacks and the lack of efficient and feasible defenses underline the need to design an effective countermeasure to mitigate the impact of the attacks on Tor.

[1] Roger Dingledine and Nick Mathewson. Tor Path Specification. https://github.com/torproject/torspec/blob/master/path-spec.txt

[2] B. Evers, J. Hols, E. Kula, J. Schouten, M. den Toom, R.M. van der Laan, J.A. Pouwelse. Thirteen Years of Tor Attacks. https://github.com/Attacks-on-Tor/Attacks-on-Tor

[3] Yossi Gilad and Amir Herzberg. “Spying in the dark: TCP and tor traffic analysis”. In: Privacy Enhancing Technologies. Springer. 2012, pp. 100– 119.

[4] Marc Juarez, Mohsen Imani, Mike Perry, Claudia Diaz, and Matthew Wright. 2016. Toward an efficient website fingerprinting defense. In European Symposium on Research in Computer Security (ESORICS). Springer, 27–46.

[5] Tor Metrics. 2021. Servers. https://metrics.torproject.org/relays-ipv6.html [Accessed 2-April-2021]

[6] The Tor Project. [n.d.]. Tor. https://2019.www.torproject.org/about/overview.html.en

[7] Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. 2018. Deep fingerprinting: Undermining website fingerprinting defenses with deep learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 1928–1943.

[8] Stuartdredge. 2013. What is Tor? A beginner’s guide to the privacy tool. https://www.theguardian.com/technology/2013/nov/05/tor-beginners-guide-nsa-browser

[9] Sun et al. “RAPTOR: routing attacks on privacy in tor”. In: 24th USENIX Security Symposium (USENIX Security 15). 2015, pp. 271– 286.

[10] Wang, Tao. 2016. Website Fingerprinting: Attacks and Defenses. http://hdl.handle.net/10012/10123

[11] Tao Wang and Ian Goldberg. 2017. Walkie-talkie: An efficient defense against passive website fingerprinting attacks. In USENIX Security Symposium. USENIX Association, 1375–1390.

[12] Jack Wherry. 2020. What is Tor (Browser) amp; How does it work? https://cybernews.com/privacy/what-is-tor-andhow-does-it-work/

[13] AWAKE The NDR Security Division of ARISTA. 2019. Network Traffic Analysis. https://awakesecurity.com/glossary/network-traffic-analysis/

[14] Wikipedia contributors. 2021. Onion routing — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Onion_routing&oldid=985814018

--

--