3Box Research: Comparing Distributed Databases GUN, OrbitDB, and Scuttlebutt
Similarities and differences between popular distributed database protocols
This post contains a summary of research conducted by the 3Box team. We outline similarities and differences between popular distributed database technologies, and ultimately say why we chose to build on OrbitDB.
Overview of Popular Distributed Databases
In web3, it seems there is a never-ending supply of new and exciting technologies that claim to solve your problems in novel ways. While designing 3Box, a social database for Ethereum users, we needed a distributed database (DDB) solution for keeping and sharing off-chain content. We began by researching some of the most popular DDB implementations in the market today. We are super excited to open source our findings and share them with the community.
GUN is a decentralized, offline-first, graph database. It provides an easy interface for developers to create apps that work offline, and that are synced automatically to other nodes once the app is connected to a coordinating server. At the time of writing it has around 3K monthly downloads (on npm) and has an active community.
OrbitDB is a peer-to-peer database protocol, as well as an implementation of the protocol. It provides various types of databases on top of the core protocol, and allows users to implement their own types. At the time of writing it has around 3K monthly downloads (on npm) and is developed by Haja networks.
Secure Scuttlebutt (SSB)
SSB is a peer-to-peer log store used as a database, identity provider, and messaging system. It’s primarily known for its use as a social network; Patchwork is their most popular client. Their client, Scuttlebot, currently has around 3K monthly downloads (on npm). SSB has a pretty active community of users in their social network.
The Layers of a Distributed Database
While investigating Gun, OrbitDB, and Scuttlebutt, we noticed that DDB architectures share a few similarities. The protocols can be thought of in separate layers:
- Networking: communication protocol which keeps nodes in sync
- Event Log: core data model for the database
- User Authentication: access control method to the database
- Interface: the api exposed to developers
This rest of this section will evaluate the three DDBs according to these layers, beginning with networking.
Distributed database systems need to reliably communicate updates between peer nodes. The networking layer defines how DDB nodes communicate and agree on these updates. The networking layer is a very important aspect of any distributed database, since a bad networking layer can lead to missed updates and inconsistent state across nodes.
Let’s examine how the three DDBs implement their p2p networking protocols:
GUN implements their networking stack using state of the art webrtc and websocket technology. This means that browsers can communicate directly with each other without the need of any server (except for a webrtc server to negotiate a connection, which is true for all browser based p2p tech).
SSB defines and implements a custom p2p network. It allows you to discover peers in your local network, connect to pubs (public nodes), and discover pubs advertised by other users. This allows SSB to work without relying on a central server to be up and running.
libp2p which is a modular p2p library that can communicate over many different transports and works in lots of different environments.
js-libp2p allows browser clients to communicate with websocket servers, with other browser clients through webrtc, and more. It is currently used in IPFS, Parity Substrate, and will be used in Eth2.0.
2. State Management
Distributed database systems need to model data in such a way that users can guarantee its integrity. This is achieved in different ways by different systems. One approach is creating a linked list with the data. Different types of CRDTs can also be used.
A linked list is used as the core data model in the two of the distributed database systems we investigated. Every link is a hash of the previous entry; and each entry is a self-contained update to the database. However, there are differences in how linked lists are implemented in the systems:
SSB uses an append-only log, which they call a feed. From what we can find in their docs, it seems like this log assumes that there are no conflicts. In this model, they assume a user always knows the most recent state of their feed. This has the potential to become problematic if the user has multiple devices providing conflicting updates, which can easily occur if a user goes offline on one device but not another. Please let us know if you have any more insight into if this is handled by SSB in some way because we are not aware.
OrbitDB uses CRDTs which is based upon an append-only log that can fork and merge, providing eventual consistency. This means that clients can go offline and create conflicting updates, but when they go online again they will be synced and end up with the same state.
GUN uses a state based CRDT, which means that it doesn’t use an append only log. Instead it communicates the state of the system at any given time. They have a short description here.
3. User Authentication
Databases need a way to manage permissions over who can perform various actions. Since distributed databases cannot rely on a central server to govern access control permissions, authentication instead can rely on public-key cryptography.
SSB has the concept of identity where a feed can be owned by only one identity. An identity in the case of SSB is simply an asymmetric key pair. This is great for use cases like Twitter-like social networks where users make global posts, but might be less ideal for comment sections, etc.
OrbitDB has an access control system where you define a set of public keys on DB creation. This allows you to have a DB that multiple users can update at once. Currently there is no way to change the initial set of public keys, although a more advanced authentication system is being worked on.
GUN has an authentication system based on an alias and password. From looking at their examples it seems like it is possible to dynamically grant people access, however their documentation is quite lacking.
Usability and flexibility are concerns for developers when choosing a database. So we asked ourselves, how easy is the interface or API which allows developers to interact with the database and build applications on top of it?
SSB has the concept of plugins which provide different “views” on the SSB log. We couldn’t really find much documentation on how to write these plugins but it seems like you basically have to write a map and reduce function.
OrbitDB offers different options for various types of data stores that have a pretty clear API. Log stores are quite similar to SSB feeds, while key-value stores have a similar interface to localStorage, which is available in all regular web browsers.
GUN uses the concept of a graph as an interface to manipulate the DB. Basically you get a node, you can then put data to it or listen to updates on it. Their hello world example will help you understand how it works.
Data Backup and Hosting
We haven’t yet discussed where the user’s data is backed up on these various systems. That’s because distributed databases do a great job of abstracting the storage concern away from developers. However we definitely want to evaluate the underlying data storage and hosting networks of these three systems to check for availability, resiliency, and usability. It would be a terrible experience for your users to lose access to their data, even if for a short while.
Some questions you might consider: What if your user loses their phone or computer where all data is stored? Or what if userA wants to get information about userB when userA is offline?
GUN allows users to connect to a http server which hosts all data in the user’s DB. The http server runs an instance of the gun DB and replicates all of the changes from its clients.
SSB has the concept of pubs, which are simply public SSB nodes that follow many users. If users lose their data, they can get a copy back from the pub — if it’s still available. Note: users have to explicitly request to be followed by a pub.
OrbitDB on the other hand, has no native concept of a node that backs up data. Instead it uses the libp2p
pubsub protocol to discover peers that are replicating the given DB instance. This allows a user's database to be actively backed up on multiple peer instances without the user needing to explicitly connect to any of them. We think that's pretty neat.
Overall the three databases share a few similarities and all of them are still early in development. In the future we imagine that some of these projects might begin to switch out some of their layers. For example we would think it would be really cool if SSB worked on top of
At this time, it seems like
OrbitDB provides the most flexible distributed database system of the three options considered.
- The Orbit interface offers the widest range of potential use cases. The variety of data stores offered by Orbit provide optionality and flexibility for building many different types of applications and tools for a very wide range of use cases.
- The Orbit network is built on top of familiar, well-maintained technologies.
ipfsprovide a solid foundation to the Orbit system that many members of the Ethereum community are likely already familiar with. Other notable benefits of
libp2pare that it
- Orbit can very easily be run in the browser. This is again because of
- The Orbit network allows many peers to host and share data.
libp2pallows OrbitDB to easily sync database updates from multiple peers, which allows many peers to host data. Because of this, Orbit allows for the creation of a network which anyone can join to help keep data available, making the entire network more robust.
The 3Box team found GUN to be interesting, and we will further explore the graph concept. However one drawback to the project is that the documentation is quite messy which makes it difficult to get a good understanding of how the DB can be used.
Secure Scuttlebutt seems more like a specific distributed social network application than a database to build apps on. And indeed, a social network is what the team says that they are building. SSB is a cool system but seems to be limited and inflexible in functionality. This will likely leave developers trying to build anything outside the standard single-feed-based social network with very limited options. It will be challenging to build a diverse set of apps on top of it.
3Box is built on OrbitDB
3Box is building social profiles for web3; and we chose to build on top of OrbitDB for all of the reasons mentioned above.
The 3Box application allows users to create a social profile for their Ethereum address, upload their information, and log into dapps.
Our Profiles API makes it simple to get and set information about Ethereum accounts, which improves onboarding, makes data sharing painless, and helps developers give users control over data that matters.
Continuing the discussion
3Box is an active community interested in all things distributed databases. If you have thoughts, feedback, experience, want to contribute, or want to integrate:
We would truly appreciate and encourage dialog around this research. Our goal is to make valuable to members of the community. Please don’t hesitate to leave a comment, especially if we missed anything or are incorrect. Thanks for your support, and happy #buidling!