Different Ways To Add Tor Onion Addresses To Your Website

Alec Muffett
10 min readSep 22, 2018

--

Version & Errata

  • v0.1 draft; 22 september 2018
  • v0.2 amended cons & summary on section 4, re: “leaks” and re: the Alt-Svc first-hit / bootstrap problem.
  • v0.3 added FAQ section
  • v0.4 nits
  • v0.5 clarified that Alt-Svc cannot loadbalance non-HTTP-based protocols
  • v0.6 JS-composition example
  • v0.7 nits
  • v0.8 +FAQ on Alt Svc RFC
  • v0.9 +FAQ EOTK
  • v0.10 added links to example websites

Author’s Note

All of these techniques are awesome and you should probably play with all of them. This is document is an attempt to provide a not-neutral but frank and objective-as-possible “pros-and-cons” list for all techniques of onionification of a public website. If there is a deficiency or unfairness in the document below, please contact me and I shall attempt to remedy it.

Differentiation

All of these solutions address the challenge of adding Onion addresses to your website, by moving the “point of onionification” to different locations up-or-down the stack of client-server communication.

The “Las Vegas Rule”

For clarity in the discussion below, I will define the Las Vegas Rule as:

The Las Vegas Rule: requests which have arrived over an onion circuit should continue to use onion circuits, and to the maximum extent possible any consequent requests which are caused by resources that have been previously loaded over an onion circuit, should further be loaded over onion circuits.

What happens in onionspace, stays in onionspace.

Optional Codicil: this goes for cookies, client-side storage, etc, too.

The Las Vegas Rule essentially defines a “viral uplift” of onion communication and (to be frank) requires administrators who really care about onion communication to somewhat “break” their isolation of the ISO 7-Layer model. This is because Onion Addresses are an (effective) Layer-3 network, but most websites and CMSes are implemented in terms of Layer-7 thinking; put differently: most people who build web servers do not greatly worry about distinguishing which network interface the traffic arrived from…

Glossary

  • clearnet site: e.g. www.example.com which is owned by you
  • onion site: e.g. www.examplexamplexam.onion which is owned by you
  • domain-absolute link: e.g.: …href=“https://www.example.com/”…
  • site-owned resources: the set of all content and resources which are under your administrative (and probably physical) control; being a site-owned resource generally implies that you control (and can tweak) the software stack which actually serves those resources to the internet.
  • CMS: content management system; software which marshals and creates and serves HTML for a website, e.g.: Wordpress
  • CDN: content delivery network; fast webservers, out on the internet, that host content on your behalf, often owned by a third party whose software stack is not under your physical / administrative control, and which can complicate the Las Vegas Rule by pushing your site-owned resources effectively beyond your physical control to serve to the internet.
  • Leak: an effective failure of the Las Vegas Rule where a request to your onion site eventually causes an attempted resource fetch from your clearnet site rather than returning to your onion site. There are many possible causes of leaks, most commonly (1) served static resources that cite a domain-absolute link to example.com (2) that a shim somehow missed rewriting a piece of content on the fly, or (3) a third-party-site callback using an API-Key as an identifier (example.onion > tracking.com/APIKEY > example.com)
  • OnionBalance: Load balancing software for onion networking, essentially an analogue of DNS Round Robin load-balancing techniques, with a sprinkle of Direct Server Return too. As-of Sep’18, OnionBalance is not yet functional for v3 onion addresses (i.e.: “the new ones”). OnionBalance can balance protocols other than HTTPS because it operates at the “Layer 3” of onion networking.
  • FailoverReplica: Load balancing technique for onion networking, essentially involves running 2 onion servers with the same private onion key, and letting them fight with each other for dominance in the HSDir space. In certain respects FailoverReplica is an ugly hack and does not scale well, but it has been used for facebookcorewwwi.
    [author note: I believe that FailoverReplica will not work for v3 onion addresses, because a sequence number has been added to HSDir descriptors to inhibit this sort of thing?]

Alt-Svc — extended glossary entry

The Alt-Svc header is feature in HTTP protocol which has recently (Sep’18) landed in TorBrowser. The Alt-Svc header informs the client that “…you accessed me via www.example.com but you may also transparently access me at www.examplexamplexam.onion…” — and then expects TorBrowser to act accordingly.

Alt-Svc Pros

  1. Has very positive benefits for load-balancing in architectures where all resources are effectively site-owned resources by virtue of being under your physical control, or else under the physical control of a third party who implements Onions on your behalf, e.g. Cloudflare.
  2. Excellent solution for infrastructure providers (Hosting Providers, Web-Accelerators, CDNs) to offer their customers, if and where it has no consequential impact upon the customer’s extant security threat model; e.g.: “Cloudflare already sees all my traffic content in cleartext, irrespective of whether or not it arrives over an onion circuit, so it’s totally okay to use them to provide me with onion addresses…”
  3. People who worry about Usability are happy that all those messy, ugly onion addresses are hidden away where the user cannot see them.

Alt-Svc Cons

  1. Absolutely requires HTTPS (Pro: does not need an EV HTTPS certificate,
    Con: cannot load-balance non-HTTP-based protocols) and…
  2. …the first connection will likely not be via an Onion, but instead via a Tor exit-node, yielding possible congestion / slow-start / poor user experience.
    [author note: someone will surely say “…but requiring HTTPS is a ‘Pro’…” and I mostly won’t argue with that; but these two issues go together…]
  3. Problematic in multi-cloud deployments or where CDNs are heavily-used but are not site-owned resources; an Alt-Svc header can only be issued by the specific origin which serves the data (no wildcards) so if you use CDN.COM and cite links like href=“https://example.cdn.com/foo.js” to serve resources, the only way to onionify your CDN traffic is to convince CDN.COM to issue Alt-Svc headers from example.cdn.com which will point at your self-owned CDN onion addresses on your behalf; and CDN.COM will either have to issue Alt-Svc headers for all requests to example.cdn.com (consuming extra bandwidth & costing you money) or else they will need to track Tor exit nodes in order to selectively issue the headers.
  4. People who care about Trust will be concerned that it’s not clear to the user if, whether, or how much of, the website has been loaded over onion network interfaces; not least for lack of a visible, ugly, onion address.
  5. The Las Vegas Rule “cookies” codicil will likely not be satisfied; however this is possibly a matter of taste.

1. Dedicated Machine (e.g. ICIJ Securedrop)

Example

Method

  • A standalone machine instance+ CMS, talking directly to an “onion” network interface and not serving traffic to the clearnet
  • CMS configured in terms of that onion address in place of a hostname
  • Onion address implemented directly upon the webserver instance

Pros

  • Clean, simple, hard to mess up
  • HTTP/2-friendly

Cons

  • Requires EV Certificate for HTTPS
  • Content must be synchronised between clearnet and onion websites, if that is relevant to the service’s purpose.
  • Source content must be kept “clean” of links which might cause leaks.

Summary

  1. Las Vegas Rule? Can be satisfied strongly; types 1 & 3 leaks are a risk.
  2. Load Balancing? FailoverReplica, OnionBalance, or Alt-Svc.
  3. Visible Onion Address? Yes.
  4. First connection over onion circuit? Yes.
  5. “Point of Onionification”: in-server.

2. Onion-Aware CMS (e.g. Facebook)

Example

Method

  • Adapt CMS to be aware of “.onion” as a top-level domain that is equivalent to “.com” or “.co.jp”
  • Adapt CMS also to treat “.onion” orthogonally, i.e.: “any request which arrives from onionspace stays in onionspace, including resultant CDN fetches” / Las Vegas Rule; this is generally what happens for localised host/domain names (www.example.co.jp) anyway.
  • Configure Onion Service to point at CMS/CDN as a reverse-proxy; if necessary use “fake onion interfaces” or a VIP-based load balancer to hint to the webserver regarding from which network the traffic arrived.

Pros

  • If your website is dynamically generated and cites resources using mostly relative paths (href=“/thing.js”) this can be a fairly easy deployment, the most complex issues to fix will pertain to cookies, etc.
  • HTTP/2-friendly

Cons

  • Requires EV Certificate for HTTPS.
  • Challenging if multiple CMSes are in scope at the same time (e.g.:wanting to onionify all of Wordpress + Mediawiki + Joomla + Django …)
  • If your website CMS is large and has evolved with the assumption that it exclusively and permanently lives under the example.com domain, then there may be hardcoded instances of that hostname that will require dynamic adaptation, e.g.: by picking up on “onionification” hints provided by the webserver in the request’s Host: (or some other) header.
  • If your website contains much static content that cites domain-absolute links, then you will need to convert those to relative paths (risky edits of a corpus of old, static content?) — or else leverage a rewriter.

Summary

  1. Las Vegas Rule? Can be satisfied strongly; types 1 & 3 leaks are a risk.
  2. Load Balancing? FailoverReplica, OnionBalance, or Alt-Svc.
  3. Visible Onion Address? Yes.
  4. First connection over onion circuit? Yes.
  5. “Point of Onionification”: in-server or in-webtier.

3. Rewriter-Shim (e.g. ProPublica, New York Times)

Example

Method

  • Onion address implemented via a Layer 7 request-level reverse proxy
  • Rewrites examplexamplexam.onion to example.com for inbound requests and request content
  • Rewrites example.com to examplexamplexam.onion for outbound responses and response content

Pros

  • Fast deployment, one proxy can serve many different CMSes + domains.
  • Operates at a high level of abstraction, mapping a cleartext domain back-and-forth to an equivalent onion domain.
  • Zero / nearly-zero changes required to upstream websites.
  • Zero / nearly-zero changes required to pre-existing static content.
  • You can “grandfather-in” one/more/any third-party CDNs to your onion address space by setting-up rewriters for them, too.
  • Rewriter can be scaled linearly to most conceivable site loads.

Cons

  • Requires EV Certificate for HTTPS
  • Rewriter must downgrade inbound HTTP/2 connections to HTTP/1.x because blindly rewriting HTTP/2 requests will break frame-encoding; this may also impact future protocols that might mandate HTTP/2. Situation may improve in future, or be sidestepped with Alt-Svc.
  • Requires ability to retrieve (and will transmit onward) uncompressed HTML/JS/XML (etc) content (i.e.: Accept-Encoding: identity) because of the need to surface and dynamically substitute hostnames “on the fly”; this increases bandwidth usage and may reduce performance.
  • Rewriter may occasionally be fooled and thereby miss an opportunity to rewrite content, but this is usually a matter of appropriate configuration; sometimes this is caused by a Javascript function manually composing a URI from an array of strings ([“www”, “example”, “com”].join(“.”)) which will not be matched by a rewriter, requiring an upstream fix.

Summary

  1. Las Vegas Rule? Can be satisfied strongly; types 2 & 3 leaks are a risk
  2. Load Balancing? OnionBalance or Alt-Svc; FailoverReplica is possible but not wholly reasonable when you are abstracted from the underlying content to this extent.
  3. Visible Onion Address? Yes
  4. First connection over onion circuit? Yes.
  5. “Point of Onionification”: dedicated front-end web-tier or proxy.

4. Transparent Alt-Svc Uplift (e.g. via Cloudflare)

Example

Method

  • People access your website as-normal over Tor using Exit Nodes
  • First (possibly: each?) HTTP request to each separate origin (a.k.a.: “site”) receives a response that contains an Alt-Svc header, which essentially says “…you can also reach me via <this ephemeral onion address>”
  • Tor Browser chooses to use the given onion address by preference, and receives both a speed boost and reduces its usage of exit nodes.

Pros

  • Does not require EV certificate for HTTPS — unless you choose to have a pure-Onion origin site.
  • Absolutely transparent, hyper-scalable, backwards-compatible
  • HTTP/2-friendly
  • When implemented by large-scale infrastructure providers like Cloudflare, will provide significant “at-scale” reduction of exit-node use via mass migration to onions acting as a client-to-CDN “backhaul” network

Cons

  • Brand new feature (Sep’18) — some rough edges, work in progress especially in the browser, but this can be expected to improve
  • See Usability-vs-Trust discussion in Alt-Svc glossary entry; is the “.onion” top level domain name more than a mere namespace? Is it part of an end-to-end security value proposition that is surfaced to the user? If so, can that proposition survive a transition to long, v3 onion addresses?
    [author note: I feel that the answer to all three of those questions is “yes”, but not everyone will agree]
  • Hard to leverage Alt-Svc for performance boosts where the initial hit on the origin is not on a site-owned resource; see the example.cdn.com explanation under Cons in the Alt-Svc glossary, or consider how tricky it might be to use Alt-Svc to onionify fetches on other sites (e.g.: tracking or third-party-metrics sites) where the software stack is not under your control / not going to issue Alt-Svc headers upon your behalf.

Summary

  1. Las Vegas Rule? Alt-Svc Uplift breaks several underlying presumptions of the Las Vegas Rule, and somewhat calls the entire rule into question; for instance the concept of “leaks” mutates into a question of trust in initial “bootstrap” connections and of ongoing transport preference.
  2. Load Balancing? Alt-Svc and many/more ephemeral-address workers.
  3. Visible Onion Address? No/Maybe; there is no reason not to use Alt-Svc to loadbalance a “pure” onion site, so it’s a matter of choice; if you are not using an onion address for your origin site, then the fact or extent of onion networking being used is not yet well-communicated to the client.
  4. First connection over onion circuit? No/Maybe, matter of your choice.
  5. “Point of Onionification”: web-tier or third-party infrastructure provider.

5. Hybrid

  • Drink lots of coffee or alcohol, and mash-up some/all of the above.

FAQs

Why do “.onion” websites need “EV” SSL Certificates?

Where can I read more about Alt-Svc and/or using it for Onions?

Where can I get a tool to implement a rewriter shim?

Alt-Svc Protocol Specification & Registries?

Hopefully Helpful Videos

NLUUG

This video was presented at NLUUG 2018, and only covers methods 1–3 because method 4 had not been announced at that time; I’ll cover the Alt-Svc mechanism in a future talk.

EMFCamp

This video shares about 60% of (revised) content with the previous video, and was presented at EMFCamp in 2018; it, too, does not cover the Alt-Svc mechanism, but provides context regards distributed networking.

--

--