TorrentNet

Peer-to-peer sites using BitTorrent and SQLite

Source: billboard

P2P (peer-to-peer) sites have gained much attention lately with systems such as ZeroNet and IPFS, which seem to improve on older systems like Freenet.

Building search-type-functionality — where users type words in a text box, and results appear — on top of these distributed systems is not quite feasible yet because users need to download the entire site (usually several hundreds of gigabytes large) before they can run queries against it.

I set on a quest to solve this issue and that’s why I created TorrentNet.

The main idea is that a site can be packaged as an SQLite database, shared as a torrent, and can be queried on demand without having to wait for the entire torrent to be downloaded.

How does BitTorrent work?

In BitTorrent when you download a file, you don’t download it sequentially like you would download a file from, say, an HTTP server. Rather you “ask for pieces” from different peers that may have that data.

The idea of splitting files into pieces allows for a better distribution of the file across a large number of peers. People download pieces in “rarest first” order so the swarm (all the peers downloading a specific torrent) remains healthy.

The pieces column shows which pieces you have downloaded

If everyone would download things sequentially there would be lots of people with the beginning portion of the file, and only few people with the end portion — making downloads for the end portion rather slow.

How does SQLite work?

SQLite is a database that works well for disk-based filesystems. In fact, a whole SQLite database can be stored in a single database.db file.

When you perform a query, such as SELECT * FROM foo LIMIT 10; , the database engine opens the file and knows to perform the least amount of disk seeks in order to fulfil the query — it knows that disks are slow so the database file is structured in a way to minimize disk accesses.

Searches are usually performed using binary search, and in fact, disks are structured using pages, where each page is a node of a binary tree.

Searching for the value “23" using binary search only requires 3 reads

BitTorrent + SQLite

The insight here is that if we can tell SQLite to talk torrent-pieces rather than disk-pages, we can essentially query torrents (torrents containing an SQLite database) on demand.

We wouldn’t have to wait until the entire torrent is finished downloading to answer queries.

SQLite comes with a really cool abstraction called VFS (Virtual File System).

The idea is that different operating systems implement reading/writing files differently. Hence SQLite provides a programmer-friendly way to overwrite the default functionality, with a custom made one.

sqltorrent is a custom VFS, which, rather than using disk pages as nodes of this binary tree, uses pieces of a torrent file.

By using sqltorrent, when you perform an SQL query, the engine won’t be talking to your disk to find the data, rather, it will talk to the BitTorrent network.


Distributed sites

Earlier we discussed how we can query torrents on demand — without having to wait for them to finish downloading — as long as the torrent shared contains an SQLite database.

But how is this torrent querying on-demand useful?

One practical application are distributed sites — specifically ones with search-like functionality.

The Pirate Bay, a famous torrent site

Site owners create an SQLite database (.db file) and a torrent from this file. They then proceed to seed this torrent — just like they would seed any other file.

Users then start downloading the torrent, but, rather than downloading pieces of the torrent in “rarest first” order, they download pieces based on the search query they performed (using the sqltorrent technique we talked about earlier).

In other words, given a search query such as “indiana jones”, sqltorrent knows to download pieces of the torrent where data for “indiana jones” is likely to be stored at.

Results are given to users in a timely manner, without having to wait for the entire database to download.

Furthermore, since search queries are just regular torrent piece downloads, a search query can be satisfied by many different peers in a swarm, effectively allowing for distributed search engines.

TorrentNet — gluing everything together

A possible TorrentNet site that crawls the DHT and TorrentNet sites

Although most of the ideas are there, gluing all these pieces together into a single “browser” with a simple user experience is complicated. Here’s the idea behind TorrentNet:

  • Torrent sites are accessible via your public key (eg. 33cwte8iwWn7uhtj9MKCs4q5Ax7B) that are shared and kept alive using the BitTorrent network via the Mutable Torrents extension (BEP46) which uses the DHT network.
  • Your public key points to a torrent which shares a single SQLite database file.
  • The database should contain a table site, and a field index, which contains the blob of the HTML you’re sharing — you can bundle all of the CSS/HTML/JS together, or have them as separate fields in the table.
  • The database may also contain other data — which can be queried from your JavaScript page. If you’re writing a search-engine, you should use SQLite’s Full Text Search tables.
  • When users access your site, they download the torrent pointed by your public key via the DHT, and immediately query it using sqltorrent by asking for the site->index field, and render it in a browser.
  • Each subsequent interaction is driven by what the site owners decides to write in the JavaScript code that was rendered in the browser.
  • A special sqlTorrentQuery() method can be used by site owners to drive interactions, and query the underlying database.
  • When site owners make modifications to the site, a new torrent of the site is shared using the Mutable Torrent extension. Users in the swarm are promptly notified and only download new data.

Here’s an illustration of the system:

Interfaces via HTML/CSS/JavaScript

When the JavaScript code of a TorrentNet site is executed, TorrentNet exposes an extra API method call: sqlTorrentQuery().

Behind the scenes, TorrentNet knows to perform sqltorrent queries when such JS code gets executed. This is similar to how browsers expose fetch() . In fact TorrentNet is a browser.

In this manner we provide site owners ways to build complex user interfaces — using tech they already know from the Web.

Rather than talking to HTTP servers (using fetch()), we use sqlTorrentQuery() to talk to peers in the swarm, and ask for pieces of the underlying database of the torrent being shared — using the sqltorrent technique (which only downloads pieces relating the query).

Users would then have a seamless experience which goes beyond what is currently available with distributed sites such as ZeroNet and Freenet.

Distributed search engines

Because SQLite works with binary search, the underlying database should be carefully indexed to minimize the number of pieces that are downloaded by users.

Fortunately SQLite comes with Full Text Search, allowing site developers to easily build indexes that are tailored to full-text querying.

I tested queries against these types of indexes using a 4.6gb SQLite database file. Before getting any results, users still have to download on average ~20 pieces (32KB * 20 = 640KB). With healthy swarms and a 500kb/s speed connection, this takes on average 1 second.

Watch a longer video of this demo

The above GIF shows an example of a TorrentNet site. It sends raw queries to the underlying database. A more user-friendly site wouldn’t provide users with raw querying functionality. It would only have a text area, and the site (via JavaScript) would transform such inputs into proper sqltorrent queries.

Download speed is capped at 100kb/s, and the read_piece section shows the index of the piece that is read at that moment to fulfil the query.

The interesting part is that, even though the user has only downloaded ~1% of the entire database, queries are still answered in a timely manner thanks to piece prioritization.

Use cases

You may ask: why would someone build these kind of sites rather than normal HTTP sites?

First off, they’re harder to censor. Because your site is not using the DNS system, but rather a distributed system — the DHT — it’s much harder for entities to stop the propagation of messages in this network.

Secondly, they’re cheaper. No need to buy a server on Amazon or a DNS name — because of the P2P nature of it, you can even host it using your home network. The bandwidth load is shared amongst the users. With HTTP servers, on the other hand, you pay for the whole bandwidth yourself.

Thirdly, they’re more resilient against DDoS attacks. Queries are essentially piece-download requests. DDoS attacks would hence require attacking all the peers in the swarm, which is much more costly than attacking a single HTTP server.

This may be an attractive solution for entities such as the Internet Archive or even Wikipedia, where large sets of data need to be shared and accessed all the time.

More “shady” applications such as The Pirate Bay or KickAssTorrents may decide to use such system — to avoid censorship but also to make it cheaper for them to host — with the size of their user-base, it would be virtually free for them to host their site.

Another interesting application would be a search engine, which, rather than crawling HTTP sites, would crawl TorrentNet sites, along with other data found in the DHT.

This may bring opportunities for new kinds of businesses to thrive off this data — similarly to how Google thrived off of HTTP data.

But more importantly, this system puts the power back in our hands. The Web will never be totally free until certain entities are able to shut down DNS names.

Read-only sites

One major drawback of the TorrentNet system described so far is the fact that sites are read-only. Unlike the Web, users can’t modify the state of a site they’re visiting.

ZeroNet seems to have solved this using central authorities that both users and sites trust. In order for users to “change the state” of a site, they need to get approval from such central authorities.

Currently there’s no facility to achieve this in the BitTorrent network and would require a separate network to handle the communication logic.

A simpler solution would be to combine HTTP sites with TorrentNet sites. Writes would go through HTTP servers. For instance, imagine having a Disqus comment section (which works through HTTP), in a TorrentNet site.

Since a TorrentNet site will execute JS code, there’s nothing stopping it from also executing HTTP requests (via fetch()) and hence have access to all of the currently used HTTP facilities (Google Ads, Disqus comments, etc.).

The important part is that you still control the site public key. If HTTP services become dishonest with the way they handle users’ writes, for instance by censoring certain comments, site owners are still free to change to other HTTP services, or even simply write their own.

Comparisons with Tor

Tor is different from TorrentNet in the sense that its main focus is anonymity.

Although, in Tor, several computers are used to achieve anonymity, it doesn’t really make it a distributed system: if the server hosting the site is shut down, the site will not be accessible anymore.

TorrentNet on the other hand is peer-to-peer, and even if many different servers are shut down, the site can still continue existing, just like several torrents continue existing nowadays after years and years.

It’s important to note that TorrentNet doesn’t give you anonymity — just like downloading regular torrents, anybody in the swarm can see your IP address. Still, there’s nothing stopping you from using Tor along with TorrentNet if you want to hide your IP address.

Comparisons with ZeroNet & IPFS

ZeroNet seems to be similar to TorrentNet. Sites are accessed using public keys and they share similar censorship resistant properties.

One major difference is that TorrentNet uses the sqltorrent technique to drive interactions on the site. This is currently not possible with ZeroNet as the site is downloaded sequentially, and only the data you already downloaded can be queried.

You cannot tell ZeroNet sites “only download data relating this query”. It will download everything and you can only query things offline.

Sqltorrent functionality might be possible to implement in ZeroNet, however, it’s unclear whether sites are structured as torrents — they would need to be split and shared into pieces in order for sqltorrent to work properly.

IPFS

IPFS, being BitTorrent-based, could easily implement the kind of solution sqltorrent provides. However, as to my knowledge, nothing working has emerged so far in the IPFS community to allow for the sort of functionality we described in this article.

It’s also important to note that IPFS is a completely separate network from BitTorrent. TorrentNet on the other hand uses the same DHT network and the same protocols of BitTorrent — hence is interoperable with existing torrent clients.

Pros & cons

Pros

  • Hosting your site is as simple as seeding a torrent. No need to buy a domain name or a hosting server. You can easily host your torrent site on your home network and let users visiting your site help you with the hosting.
  • Since you control your address (public key), which is broadcast via the DHT, it’s much harder for governments and institution to block the content you’re sharing.
  • Via sqltorrent you drive your users experience by letting them only download pieces of the torrent that are relevant to the users’ interaction. Richer interfaces, such as search-engines, are possible using this technique.
  • TorrentNet uses the famous libtorrent library, and follows BitTorrent standards, hence should play much nicer with other clients.

Cons

  • Read-only sites for now. ZeroNet seems to have a solution to this problem but I’m not convinced yet.
  • Your site doesn’t have a pretty name. Sharing your public key can be much harder than sharing a DNS name.
  • Slow first-time access to sites. Since DHT lookups are required to find the torrent and to find the peers sharing the torrent, first-time loads of sites will be noticeably slower than HTTP sites.

Conclusion

TorrentNet is still under development. Currently I’m thinking whether the system should exist as a background-service daemon, and have the user-interface run on http://localhost:5999 (or some other port).

Another option would be to bundle it with Electron, but this would require users installing “yet another browser” rather than using their default one.

Special thanks to the people in the #bittorrent IRC channel on freenode for the fruitful discussions, and of course to Steven Siloti for creating sqltorrent.