What is BitTorrent, and What are Torrents?
BitTorrent is a communication protocol for P2P file sharing (or peer-to-peer file sharing), which allows for decentralized distribution of data over the internet. A simpler way to put this is that instead of downloading a file from a centralized server, it can now be downloaded from multiple sources, all acting as mini server nodes. How this is achieved will be discussed later on in this article. A BitTorrent client (a program that implements the BitTorrent protocol), which is connected to the internet, is required to send or receive files. The BitTorrent specification, initially developed in 2001, is free to use, and many clients are open-source.
A torrent file contains metadata about files (and folders) to be distributed and how to download them. A torrent file (which has the extension “.torrent”) only contains information about those files, not the files themselves, and is thus extremely small in size — typically a few KB at most.
Important Terms:
Tracker: The “tracker” server keeps track of where file copies reside on peer machines, which are available when the client requests, and helps coordinate efficient transmission and reassembly of the copied file.
Seed: Seeding refers to the act of uploading the contents of the files of the torrent. A seeder is one who continues to upload the file after having downloaded it.
Leech: Leechers are those who download the file but refuse to seed it or throttle upload speeds. (P.S. Be a seeder, not a leecher — the world has enough of those).
Peer: A peer is someone downloading the file from a seeder (who has fully downloaded the file) or from other peers (who may have downloaded a required piece) but who doesn’t yet have the full file.
Swarm: A group of people downloading and sharing the same torrent.
Tracker: A server that tracks all the connected users and helps them find each other.
Client: A program that is capable of implementing the BitTorrent protocol
Developments to BitTorrent:
2001: The BitTorrent protocol was developed by Bram Cohen, a University at Buffalo alumnus.
2005–2006: Distributed tracking using distributed hash tables allowed clients to exchange data on swarms directly without needing a torrent file and peer exchange functionality, meaning connected nodes could share information on peers.
2017: BitTorrent v2 is rolled out, replacing the hash method from SHA-1 (which was no longer considered safe) to SHA-256. v2 .torrent files format supports a hybrid mode where the torrents are hashed through both methods so that the files will be shared with peers on both v1 and v2 swarms. Each file was now hashed individually so that if multiple torrents included the same files, seeders of the other torrents could act as seeders for the common files. Another update was the addition of a hash tree to speed up the time between adding a torrent and downloading files and to allow more granular checks for file corruption.
How do Torrents Work?
The file that is being distributed is divided into segments called pieces. The first uploader acts as a seed, and downloaders would initially connect as peers. People who wanted the file would download the torrent, and their client would use it to connect to a tracker that had a list of the IP addresses of other seeds and peers in the swarm. As each peer receives a new piece of the file, it becomes a source (of that piece) for other peers, relieving the original seed from having to send that piece to every computer or user wishing a copy. Once a peer completes a download of the complete file, it would then function as a seed.
Each and every piece is protected by a cryptographic hash in the torrent descriptor, and thus, any modification of the piece can be detected. Pieces are typically downloaded non-sequentially, as readily available pieces are downloaded first (reduces the overall time of the download) and are rearranged into the correct order by the BitTorrent client, which monitors which parts it needs and which pieces it has and can upload to other peers. Pieces are of the same size throughout a single torrent (for example, a 100 MB file may be transmitted as 100 1 MB pieces, as 400 256 KB pieces, or even as 3200 32 KB pieces). Thus, the download of any file can be halted at any time and resumed later without the loss of previously downloaded information, which makes BitTorrent useful in transferring larger files.
Structure of a Torrent File:
A tool like this one (https://torrent-file-editor.github.io/) can be used to view a torrent file.
announce: The tracker of the file(s) being downloaded.
comment: Comment about the torrent file added by the torrent’s creator. (Optional Field)
info: Contains length and path of each file.
name: Name of the torrent file.
pieces: It denotes each piece’s hash (SHA-1 or SHA-256, depending on version) stored in a hash list.
created by, creation date, piece length: Self-explanatory.
Direct Download vs. P2P Networking:
Due to its centralized nature, direct download suffers when there is a heavy load (i.e., multiple people attempting to download a file simultaneously). This is the most significant singular advantage of torrents — all the users would act as peers, and ultimately, the download would be much faster for everyone. Its inherent ability to be more efficient under heavy loads is simply remarkable when it comes to real-world use cases.
However, this very ability comes at a cost. If files are under minimal demand, chances of it having a seeder are next to none. This means it would be impossible to even download the file, unlike a direct download that always serves the file from the centralised server. While it would technically be possible to always have a seeder up and running for these files, it would be rather expensive and defeats the whole point. Perhaps a middle ground can be reached for certain files that are high in demand initially, which then decline, with a switch from a p2p network to a centralised network, but the feasibility of this is yet to be explored.
Several torrent clients offer a helpful feature — sequential download. This enables a file to be downloaded in order (earlier pieces are assigned a higher priority, so they tend to be downloaded first). A practical use case would be to watch a video as it’s being downloaded once about 10–20% of it is downloaded and allow the download to finish in the background.
Downloading files through torrents typically results in higher data usage as pieces are downloaded and uploaded. The difference depends on your upload ratio, but I’ve personally observed it to be 1.2x the actual size (i.e., you end up uploading 20% of the file size while you are a peer). Of course, if you continue to seed (which you should unless you have data caps), your usage would be higher.
When downloading files from torrents, most clients allow you to assign priorities for each torrent and each file within a torrent, as well as skip a few files if needed. This is something lacking in direct download unless each file is available for download separately.
One of the primary use cases of torrents is in the field of digital piracy, as it allows for the sharing of files without having to pay for a centralized source hosting the files and incurring high costs due to increased network usage.
Some games also use it to distribute updates as well as by certain Linux distros, once again to cut down on costs.
When downloading using torrents, all other peers can see your IP address and notice that you are attempting to download that file, too. If this is a concern, a simple fix would be using a VPN to mask your IP address. It would also help to prevent your ISP from being able to track what you are torrenting (or that you are even torrenting).
Extra Reading Material (rather interesting stuff, I promise):
Torrent Poisoning: Torrent poisoning is the act of intentionally sabotaging files being shared with the BitTorrent Protocol. It is typically employed in anti-infringement efforts. It involves methods like directing users to fake/invalid sources, trackers, etc., modifying the chunk being transferred (which can be detected due to the hash mismatch), being a bad peer dragging down the swarm performance, etc.
How to create your own torrent: https://youtu.be/wVpnh2EkNhY?t=208. This video explains it better than I could over text.
Magnet Links are used as an alternative to sharing torrent files. They are simple links containing only the essential information stored in a torrent file and a cryptographic hash of the torrent files. The client only has to compare the hash in the magnet link to the hashes of torrents being shared, filter out only those with matching hashes, and quickly reconstruct the swarm of peers on the network. This is similar to the Distributed Hash Table used by “trackerless” torrents, which don’t utilise a central server to coordinate and keep track of peers.
Circumventing Torrent Blocks: Several ISPs across the globe block torrent index sites like thepiratebay.org, 1337x.to, kickass.to, etc. VPNs might be necessary to encrypt your traffic to avoid fines or threats from your ISPs. Another alternative is using a seedbox, which is a remote server on which you can download your torrented files and then transfer those files to your computer using FTP, HTTP, etc.
Economics of Torrents:
Torrents themselves don’t inherently come associated with any financial aspect of any sort, apart from potential savings on costs of centralized servers, etc. Torrent index sites, seeders, and groups that develop BitTorrent clients don’t earn much either (and tend to rely on donations). The primary concern here is more in relation to how torrents help facilitate digital piracy, so why don’t we discuss that, even if it is a detour?
The estimated annual loss because of digital piracy is hard to assign a value to, but the number seems to lie within 3 to 30 billion USD , according to several studies. Here’s why that number is heavily inflated: This number is a sum of revenues, and economists tend to pull sneaky tricks on us — a $100 item might be split as $50 cost to manufacture, $20 to distribute, and the remaining to the store. When doing the math, it’s not uncommon for the value to be calculated as the sum of revenues = $100+$50+$20 = $170, which is clearly inaccurate. Thus, the number is already inflated and disingenuous. In the case of digital piracy, from a logical standpoint, the only acceptable costs to add are the cost to “manufacture” or “produce” as well as missed-out profits and not costs like distribution or server costs. This would once again cause the number to drop. And last, quite possibly the most significant factor, most of those who pirate are young adults or people from poorer nations, who share one thing in common — they couldn’t afford to pay for what they pirated even if they wanted to. So, even if piracy were eradicated, the revenues wouldn’t be affected as much as companies seem to think. Ironically, piracy has certain benefits associated with it as well — it helps boost word of mouth for products and helps cultivate customers for the product in the future when they are in a position to legally obtain it.
And thus, we’ve come to the end of this article (and my tirade against economists and corporations that twist the truth to gain your favour). I hope you enjoyed the read and learnt a thing or two :)