A pinch of privacy: Tor from within Go

Published in

InterPlanetary Social Network

16 min readJul 21, 2018

The internet is changing… slowly, seamlessly and unforgivingly. Over the past two decades we’ve been witnesses to the transformation of the original dream into centralized hubs under the guise of efficiency and convenience. With that hand in hand, we’ve witnessed never-ending data breaches, power abuses and violations of privacy.

We’ve become so numb and oblivious to these central operators, that seeing a Facebook “like” button track us on every website seems the norm of the 21st century. We don’t even blink while sharing our GPS position with a Bluetooth speaker, and gladly give up any information about ourselves and our friends as long as we get a pretty chart on who has the largest… traveled mileage.

Google location history dashboard

The mantra usually goes:

“We’re only tracking you to provide better ads” ~Service operator
“We’re only collecting metadata, nothing sensitive” ~State agency
“Who cares if the flashlight knows my GPS position” ~Average user

Metadata is worth more than data

Story time…

Etherscan is one of the most prominent Ethereum block explorers out there. If you are unfamiliar with it, or cryptocurrencies in general, suffice to say that it allows you to check your virtual bank account’s current balance, transactions, etc. For example, this is the Ethereum Foundation’s development fund.

All this information is (pseudonymous) public knowledge, accessible at many different places, even from your local machine if you put enough effort into it. Since the balances are public, it doesn't really matter where you get it from, as long as you trust the source; might as well go with the convenient option. And indeed, the information you access on Etherscan is completely worthless from a financial perspective… however, the fact that you accessed it is invaluable.

Why? Because you unknowingly associated the account with your IP address. Not only did you tell this to the operators of Etherscan, but you also inherently shared it with Google through the analytics service embedded in the website. At this point Google has the capability to cross reference the Etherscan access data with their account sessions, mix in positional information from Android devices… and voilà, Google has a real-time updating map of all the Ethereum account holders… and all they needed was you to access public data… once.

Now you understand why some people say data is worth more than oil… and why government agencies are fighting for unfettered access to metadata.

We kill people based on metadata. ~General Michael Hayden (NSA)

The vicious cycle

Developers are perfectly aware of the dangers of metadata and also perfectly aware of the capabilities of aggregators like Google, Facebook and Cloudflare. Why on Earth would a developer use Google Analytics, knowingly giving your valuable information away? For the exact same reason most people use public online services to check their cryptocurrency balances… convenience.

Each of us — developers and users alike — have way more things we would like to do than we have time to do it. Understandably, each of us will pick the tools that make our lives easier, which usually boils down to saving time or money. Out of all the possible tools and services though, why is it that the proper ones are inconvenient and expensive whereas the “leaky” ones simple and free?

And the answer is… your data pays the developers. The more data a service collects, the more money it makes, the better its operators can make it. With no data collected, developers need to charge money and may forever have a financial disadvantage. The end result is a vicious cycle where leaky tooling makes more money, gets better and cheaper, eventually achieving monopoly.

The only way out out this mess is to bite the bullet and start refusing to give up personal data for instant gratification:

As a user, invest the time and effort to protect yourself.
As an operator, dare to build on inconvenient technologies.
As a developer, help make privacy protecting tools approachable.

There is an endless stream of projects that focus on privacy. For this article I’ll present on one that usually gets a bad reputation, specifically because it’s that good at what it does… Tor.

The Onion Router

Tor is free software for enabling anonymous communication. The name is derived from an acronym for the original software project name “The Onion Router”. Tor directs Internet traffic through a free, worldwide, volunteer network consisting of thousands of relays to conceal a user’s location from anyone conducting network surveillance or traffic analysis. Using Tor makes it more difficult to trace Internet activity to the user: this includes visits to websites, online posts, instant messages, and other communication forms. Tor’s intended use case is to protect the personal privacy of its users, as well as their freedom and ability to conduct confidential communication by keeping their Internet activities from being monitored. ~Wikipedia: Tor (anonymity network)

Art by Molly Crabapple & Words by John Leavitt: *“Octopus Not So Great!”*

If you feel this is getting way over your head, fret not… Tor is black magic for most people: users and developers alike. There isn’t much information going around as to what Tor is or how it works, but plenty horror stories about the “dark web” and its less than savory applications.

Speaking from personal experience, it took me a very long time to look into Tor specifically because of its bad reputation. Nobody wants to be part of- or aid criminal activity. However, after bumping into Tor for the Nth time while researching solutions to decentralized privacy challenges, I took the plunge and discovered a completely different world than I expected.

Myths about Tor

Before getting into details however, it might help to bust some of the common myths that keep developers away from the technology.

Q: How can anyone trust Tor if it was developed by the US military?
A: The original concepts behind Tor (onion routing) was indeed designed by the US navy, but the goal was not a SIGINT tool, rather a private network for protecting their own communications. The actual software was developed by Roger Dingledine, Nick Mathewson and others as part of a non-profit, with all design documents and source code available for public inspection.

Q: Isn’t most of the network run by three-letter government agencies?
A: Whilst some agencies may occasionally run a few nodes, the Tor network was designed to protect communication privacy and participant identity even from itself. Government agencies generally have more easily available places to monitor internet traffic than infiltrating a global network of computers.

Q: Isn’t Tor uselessly slow for any meaningful internet activity?
A: The current network capacity of Tor is advertised at about 250Gbit/s and the current load at about 125Gbit/s, leaving plenty to spare for new use cases. Routing data multiple times through a global network of computers is indeed a significant hit on latency and bandwidth, but Tor should be usable for most applications that don’t have very high connectivity requirements.

Q: Isn’t the Tor network mostly used for illegal activities?
A: According to the current statistics, only around 1Gbit/s of the Tor traffic is onion services (i.e. origin and destination is both within the Tor network). As such, 99% of the Tor traffic is accessing publicly available websites. The most probable cause for using Tor is either to get around state restrictions (e.g. the Great Firewall of China) or to maintain anonymity (e.g. whistle blowing). As for onion services, one with the highest traffic is Facebook, so illegal activity should be even lower than 1% given the many legitimate hidden services.

Q: Won’t government agencies start watching me if I use Tor?
A: Chances are you are already being watched. Section 702 of FISA allows the US government to spy on the internet communications of people both in the United States and abroad without a warrant so long as a “significant” purpose of the surveillance is to gather “foreign intelligence information”. Almost all of the websites nowadays have Google and Facebook integrations and are served via Cloudflare, all US companies streaming most of their data through the US.

A word of caution. If you think based on the above that Tor will protect you from prosecution for illegal activities, you should be aware that Tor is just another software: it probably has bugs and its design has limits. Coupled with other vectors, a well enough funded attacker will probably be able to deanonymize you. Use Tor to protect your privacy from general snooping, don’t be stupid enough to think it will protect your shady businesses.

Anonymizing your connections

Now that we’ve got the common misconceptions out of our system, lets take a closer look as to how Tor actually achieves what it promises it’s capable of. We won’t go into the nitty-gritty cryptography details, rather just have a high level overview of the architecture.

Tor’s primary goal is to allow its user to access a remote website (or in general any TCP service) anonymously by hiding the source and destination addresses from prying eyes in the network. Tor achieves this via constructing a random communication circuit through a global network of volunteer computers and uses that circuit to stream the data back and forth (kind of like the internet).

Opposed to how the internet works however, Tor encrypts the data multiple times, intermediate nodes decrypting it layer-by-layer until the exit node has the plain-text data to forward to the destination server. Due to the way the encryption is set up, intermediate machines only know the previous and next hops, but not the entire circuit. This ensures an attacker can only intercept a stream if it can control the entirety of the randomly built circuit.

How Tor Works (Electronic Frontier Foundation)

Why is anonymizing our connection good if we’re not doing anything illegal? Because it makes it harder for various central trackers to follow us around and build a detailed profile that might contain unexpected sensitive data (e.g. our cryptocurrency holdings).

It’s important to emphasize though, that whilst Tor is a beautiful technology, the web is a leaky beast! An anonymous TCP connection does not mean that it is devoid of identity: Accessing https://www.linkedin.com/in/karalabe/edit gives a fairly good idea as to who is at the other end of that URL.

Even more alarmingly, browsers leak enough information into every HTTP connection to uniquely identify the same user across arbitrarily many (even geographically disparate) sessions. Tor is a valuable tool, but it’s a building block, not an end user product. More on this later…

Anonymizing your services

Tor’s secondary goal is to allow its users to run hidden services that are only reachable by entities knowing about their existence (and access credentials), but are otherwise completely secret to all network participants. This is Tor’s most powerful and most controversial feature at the same time, as it permits both novel applications as well as novel illegalities.

The reason hidden services are such a powerful feature is because they allow running an arbitrary TCP server behind a firewall without any ports open or forwarded; and from the outside the machine looks fully locked down. This is achieved by using the Tor network as a rendezvous point from both the client as well as the service, each endpoint only dialing out.

Technically, the hidden service is set up by pre-building a few Tor circuits into the network to random introduction points. These introduction machines are advertised (encrypted) into a public DHT for anyone (knowing the public key) to look up and decrypt. Wannabe clients can then connect (via Tor circuits) to one of the intro nodes and request a connection. The request is forwarded to the hidden server, and if it agrees, a new circuit is built from both server and client to a common rendezvous point to transfer the data through.

Bird’s-eye view of hidden services (see details)

Defending anonymous browsing was fairly easy… but why is anonymizing our services good if we’re not doing anything illegal? Because it makes it infinitely harder for a remote attacker to find our service (which to technical people flat out looks like an open door waiting to be entered) and wreck havoc on it (e.g. logging into our home SSH server becomes impossible* for an attacker, simply because they can’t find it in the first place).

Whilst putting a hidden website on top of Tor is one of the most common use cases for anonymous services, the beauty of this feature lies in its capability to act as a building block for developers. It opens up a slew of new possibilities, like interconnecting mobile devices in a peer-to-peer fashion without either of them having to expose themselves to the world. More on this later…

Getting Tor into Go

With the philosophical and theoretical background behind us, lets dive into actually building on something on top of Tor. I’ll be using Go from this point onward, simply because it’s my favorite language. You can probably achieve the same things with any other language, but that’s your cross to bear. 😋

Although Tor provides all these insanely useful building blocks for distributed computing, it’s genuinely hard to build anything using them. The main reason is that Tor itself is a huge C++ project, which was meant to be run as a stand-alone process acting as your gateway into the Tor network. Even though there is a really nice Go library to bridge your software and the Tor gateway, anyone wanting to use Tor requires their users to install some weird program that has a notoriously bad reputation. It just doesn't fly…

The Tor project acknowledged this significant hurdle and modified their code to permit linking the gateway into your own binary, but this plays very nastily with Go. Firstly, Tor has multiple external dependencies that you also need to link in. Secondly, whilst Go does support linking C/C++ libraries, it does not support building them! The “correct” way to link Tor to your Go application is to build each dependency manually via autoconf/configure/make, after which to copy the .a files into your GOPATH to the correct location… and load them manually via CGO 😱. Although there is a Go package that automates some of these steps, you can say goodbye to all the Go tools like go get or vendoring.

Hello go-libtor

There must be an easier way. Can’t we use some CGO black magic to build all these dependencies in a cross platform way? Unfortunately Go does not have, nor will ever have support to generically build C++ projects, simply because Go is not make. That said, nobody ever said we have to run configure during compilation… 😈

Thus the go-libtor project was born: a self-contained, fully statically linked Tor library for Go! The package kind of “hacks the system”. The quite perverted trick it uses is to execute all the required autoconf/configure/make commands in nightly Travis cron jobs, hook into the build system to see which source files are needed, and create a Go mega-wrapper consisting of hundreds of Go files that simply include C++ sources from all over the dependency graph. Insane? Yes! Does it work? Yes!! 😵

Why the heck don’t we just reimplement Tor in Go if it’s such a pain in the butt to use it? Because Tor is a 15 year old project, during which it stood the test of time and countless researchers’ attempts to break it (occasionally succeeding, resulting in fixes and a more resilient code). Stand on the shoulder of giants!

Using Tor from within Go

Demo time…

First thing’s first, we need to go get the Tor library. Usually Go packages are fast to build, so no need for fancy verbosity settings. The go-libtor project is however very heavy on CGO (~1000 files), so lets pull with high verbosity to see what’s happening. Building the library on a capable system will easily take a couple of minutes.

$ go get -u -a -v -x github.com/ipsn/go-libtor

You might have noticed an unusual -a flag in there. Sadly, the Go compiler currently cannot detect changes in CGO included files. Due to this, when the go-libtor package is updated, you need to force a rebuild. Sorry.

Since there is already a solid Tor bridge implemented from Go, the go-libtor package doesn’t duplicate the effort (or copy paste it), rather the two projects are to be used hand-in-hand. For our first demo, let’s create a super secret web server serving up “Hello, Tor!” over the onion network!

The above code will:

Start up a new Tor process from within your statically linked binary
Register a new anonymous onion TCP endpoint for remote clients
Start an HTTP server using the Tor network as its transport layer

$ go run main.goStarting and registering onion service, please wait a bit...
[...]
Enabling network before waiting for publication
[...]
Waiting for publication
[...]
Please open a Tor capable browser and navigate to http://s7t3iy76h54cjacg.onion

Boom! 💥 We’ve just created our first Tor service!

Well, that was easy. With a few lines of Go code we’ve created a hidden TCP service inside one of the world’s most infamous networks… the “dark web”. Btw, the browser I used to test the server with is Brave, which among others has built in experimental support for Tor. Give it a spin, good stuff!

Using Tor from Go from Android

Ok, I’ll be honest. If the above demo was all that the go-libtor project would deliver, then I myself would question a bit its reason for existence opposed to cretz/tor-static, which does the shared lib shuffling manually.

Let’s take it a step forward however, and see how much effort it would take to deploy our Go code to an Android device. If you’re not too familiar with cross building Go to mobile devices, it entails generating C/CGO wrappers for every public Go function and type, compiling the resulting code into shared libraries for Android on amd64, 386, arm, arm64 and generating Java/JNI loaders for everything. This is an insane complexity that is automatically handed behind the scenes by gomobile.

Unfortunately, if your project has a custom build step to build libraries, it’s up to you to compile those to all the different architectures on which Android can run. You’ll also need to convince gomobile to link your libraries with its own into a single .aar; and lastly, it’s up to you to maintain a build environment (Android SDK/NDK, cross compilers, etc). Doable? Absolutely! Enjoyable?

The advantage of go-libtor starts to show here, since it’s composed of simple CGO Go files. As it doesn't require custom build steps or tooling, it plays nice with the Go ecosystem, gomobile included. In theory, I should be able to take the above code and build it for Android… let’s try.

The above code does approximately the same thing as the one before, just in its own package with a trivial API since we want to make an Android archive, not an entire .apk. We can invoke gomobile to bind it:

$ gomobile bind -v -x .
[...many logs, much wow...]$ ls -al demo*
-rw-r--r-- 1 karalabe 38976071 Jul 19 18:46 demo.aar
-rw-r--r-- 1 karalabe     6162 Jul 19 18:46 demo-sources.jar$ unzip -l demo.aar
Archive:  demo.aar
  Length      Date    Time    Name
---------  ---------- -----   ----
      143  1980-00-00 00:00   AndroidManifest.xml
       25  1980-00-00 00:00   proguard.txt
    11044  1980-00-00 00:00   classes.jar
 26102356  1980-00-00 00:00   jni/armeabi-v7a/libgojni.so
 27085856  1980-00-00 00:00   jni/arm64-v8a/libgojni.so
 26327236  1980-00-00 00:00   jni/x86/libgojni.so
 27757968  1980-00-00 00:00   jni/x86_64/libgojni.so
        0  1980-00-00 00:00   R.txt
        0  1980-00-00 00:00   res/
---------                     -------
107284628                     9 files

Oh, nice… we’ve just built an Android archive containing an embedded Tor node to 4 different architectures! But does it blen… I mean build? Let’s try! Explaining how to load an .aar into an Android project is beyond the scope of this article, but you can load the archive with Android Studio as a module and edit your Gradle build config to add it as a dependency. Finally an uber-crude app would just start the server and drop the onion URL into a label.

Mega-boom 💥💥💥 It works from Android too!

That’s actually it! We’ve managed to get a Tor hidden service running from an Android phone and access it from another device through the Tor network, all through 40 lines of Go- and 3 lines of Java code. Of course, getting it working with a real-world mobile application will probably have plenty of gotchas and headaches, but Tor seems an extremely powerful building block that could in theory open up amazing new possibilities.

Stay calm and #buidl on

Apparently Tor is this amazing network technology that can solve some really hard problems around distributed privacy. But what can you use it for beyond anonymizing (some portion) of HTTP traffic?

One very nice example is private, decentralized, real-time messaging. There have been multiple implementations, but the current champion is Ricochet. Since connecting to the Tor network can be done from anywhere and anyone can easily run hidden services, each of the Ricochet users could run an entire server from their local machine (or mobile device). Essentially, the messenger is a federation of servers exchanging data directly without a central operator.

A second popular example is private, decentralized file sharing, championed at the moment by OnionShare. It works by starting up a one-shot Tor service on the sender’s computer, which serves a particular content once, to the first client that has the access credentials. After that the system is torn down. This permits people to share files without having to worry about their data being stolen due to hacker breaching central systems (looking at you, Dropbox).

The above examples highlight that Tor can be used to a lot more than meets the eye. Given Tor’s steady increases in performance, reliability and also the proliferation of more and more powerful mobile devices, I think we’re these use cases are just scratching the surface of what can be done.

Epilogue

I hope this article managed to whet your appetite a bit for looking into the Tor network. It’s a brilliant piece of technology that has many uses cases that have nothing to do with the “dark web”. With the advent of decentralized peer-to-peer computing, I’m certain Tor will have yet to play a very important role in ensuring privacy for distributed global communities.

If, however, you do choose to use Tor in one of your projects, please take the time to fully understand how the system works; what the implications to you and your users are; and what Tor cannot protect you of.

Most importantly, unless you are intimately familiar with the Tor network have good lawyers and have the permission of your ISP, don’t dream about running a Tor relay! Using Tor is fine, running Tor is a legal gray-zone.

Until next time…