Comparing IPFS and Dat
Distributed file systems
A core component of decentralizing the web is how you store and share data. IPFS and Dat are distributed file systems that can provide data storage for the decentralized web. They can be thought of as essentially next-generation versions of bittorrent. This post will compare the high level and technical details.
To skip straight to the technical details, click here.
The vision for IPFS is to upgrade the internet by changing how data is addressed. It aspires to be infrastructure for the distributed web, and to help ensure the permanence of data, since data in IPFS persists as long as at least one person keeps it around. This expansive vision means that its collection of libraries and protocols has been designed in a way that’s extensible for many use cases, but not necessarily optimized for any particular case at the base layer. Dat was originally designed for sharing large scientific data sets from the desktop, but has broadened its scope over time to focus on the distributed web as well. It remains focused on a more specific set of use cases optimized for mutable data (that changes often). Both projects share similar motivations and starting points, but have diverged in implementation — the creator of IPFS collaborated with Dat early on, and both were inspired by bittorrent and git.
Estimating the size of p2p networks is difficult, since there is no central platform that can provide a global view on how many people are using it. Crawling the network and counting all new nodes discovered can provide an estimate, but these protocols can also be used within an organization or private network, and these use cases are not visible from the outside. As of December 2019, there are about 300,000 nodes on the main IPFS network, and ~100,000 on the OpenBazaar IPFS network. The Dat network does not have an estimate of how large the network is, but the number of nodes and contributors is significantly smaller.
Notable applications built on IPFS include OpenBazaar, Dtube, Everipedia, and Textile. The modular design of IPFS has led to some of its component libraries being adopted by projects that aren’t interested in using the whole protocol. Libp2p, the peer-to-peer networking library, has been adopted by many blockchain projects including Ethereum 2.0. Dat is supported by the Beaker browser, which aims to make it easier for people to publish to the p2p web. Websites built on Dat can be browsed and created within Beaker.
An ecosystem of companies and organizations contributes to the development of both protocols. IPFS is primarily maintained by Protocol Labs, a company which was a part of YC batch S’14. The project has benefited from a $250 million ICO investment in Filecoin, a cryptocurrency which will provide a natively monetized pinning service for persisting data on IPFS. Dat’s development is coordinated through the Dat Foundation nonprofit and led by two companies, Blue Link Labs, which develops the Beaker browser, and Hyperdivision consulting.
IPFS leverages content-addressing to its fullest potential using Merkle DAGs, a highly flexible global data structure. Dat uses pubkey-addressing to create a more familiar file system that’s compatible with p2p networking.
A quick comparison:
- Content-addressing, data is stored and referenced by hash
- Has a global namespace — data can be accessed from any context, and is de-duplicated to prevent it from being stored twice.
- Defines a generalized way of referencing merkle data structures
- Pubkey-addressing, content is addressed as file-drives under a public key
- Focused on mutable data. You pull whatever files are under the requested key, so those files can easily change.
- Keeps a version log of changes to a dataset over time
A key difference is in how data is discovered in the network. In IPFS, you are looking content up by its hash. In Dat, you are looking content up by a public key, which could belong to a person or a site. IPFS also defines a pubkey-lookup protocol through IPNS, the “inter-planetary naming system,” but it is used less often than looking up content through DNS, and is significantly slower. Both use a DHT to discover peers, and optionally use DNS to give short-names to keys, so that users can have a “foo.com” address instead of a long hex string like:
In IPFS, once you add data to the network, other people must access and share it for it to persist. If you want to ensure that it stays available, you can “pin” it. Pinning services are essentially nodes that agree to host content for you, sometimes for a fee. You can also add it to a gateway, which is an IPFS node that is accessible from the rest of the internet. Dat uses the same system of gateways and persistence services, except nodes hosting content for others is called “seeding”.
IPFS is built on top of IPLD, a generalized way of referencing merkle data structures. This allows IPFS to handle any hash-linked data structure. Dat is built on an append-only log called hypercore, which supports data structures built on top of it. The primary data structure used is a trie-based tree structure, Hyperdrive, that behaves like a folder of files.
Other protocols in the decentralized web ecosystem, including ssb and Solid, will be addressed in a later post. A broader view of the history of decentralized file storage could include a comparison of bittorrent, Tahoe-Lafs, Freenet, and other predecessors. Leave a comment if you’d like to see more comparisons, or if there’s anything I’ve missed.
More IPFS vs. Dat comparison resources: