Update: Suspending Development of DeltaDB
I have decided to suspend development of DeltaDB for the following reasons:
- Recent enhancements to both PouchDB and CouchDB have made PouchDB initial replication much faster
- After more analysis of the last-write-wins resolution policy, I am left feeling that the last-write-wins resolution policy is mostly good for real-time systems that are always online. In DeltaDB, this last-write-wins policy results in a Reasonable Ordering. Moreover, the last-write-wins policy is nice when starting a new project as it is automatic, but other conflict resolution policies that force the user to manually resolve the conflict, like CouchDB’s revision protocol, have become more of the standard in the offline-first world.
- Building a DB that scales and is distributed over many nodes, takes a lot of work. I considered some of the necessary details when initially designing DeltaDB, but have only scratched the surface of what needs to be done. There are other DBs, like CouchDB 2.0, that have nearly solved these problems and CouchDB has been in development since 2005.
My name is Geoff Cox and I am the architect of the DeltaDB open source database mentioned below. I want to start off by saying that I have great respect for all the projects mentioned below and for their brilliant contributors! I’ve spent many hours working with the projects mentioned below and here is what I’ve learned.
Since the days of the first smartphones, there have been people describing a future where we can write a single app in HTML5 that runs on all devices. Cordova allows us to package our HTML5 and build a native app that can be submitted to the different app stores. Ionic allows us to implement native-like UIs. And, the plethora of JS frameworks like AngularJS, EmberJS, React, etc… allow us to structure our apps using the latest MVC constructs. But… the data sync portion of this vision is missing.
Sure, if you have plenty of time on your hands, you can use your favorite backend language to create a RESTful API and then use either IndexedDB, WebSQL or LocalStorage in your app to implement the data layer. But, you’ll quickly find yourself spending a lot of time modifying your whole stack when all you really want to do is modify some data in your app’s local database and then have it automatically sync with the cloud. Moreover, it is utterly embarrassing how difficult it is to work with persistent browser storage, especially IndexedDB. On the surface, IndexedDB seems fine, but as you start to dig deeper you’ll find that you’ll spend days hacking through issues such as synchronization. Then, you’ll find that some IndexedDB implementations such as Safari’s are so broken that you’ll have to fall back to WebSQL. The bottom line is that persistent browser storage just isn’t where it should be and you’ll have to work hard to get it working the way you want it to work. OK, a few years ago when browser-side storage was new, we understood, but today, this is just unacceptable. The good news is that there are projects like Dexie that help you avoid these persistent storage issues. Unfortunately, this just isn’t enough.
Enter the new breed of browser-side databases that sync, like Firebase, Meteor and PouchDB. With these technologies, we can change data locally and then our backend database is updated automatically! Unfortunately, they all have some significant limitations of which I will elaborate on below.
Firebase is incredibly popular and is great for real-time data exchanges, but your app needs to be online when it starts or else Firebase won’t work. You may be thinking, who cares? The reality is that many people do. Could you imagine if Evernote didn’t work offline? What about WhatsApp? The Google Maps and Facebook teams just added a ton of offline capabilities so they obviously care as well. Google even stresses that Chrome apps should be offline-first. Apps that work offline can be used in more scenarios and tend to be more fault tolerant so it just makes sense to design your apps this way. Besides the design benefits of these thick clients, it’s easy to imagine why being offline-first matters when you consider how flaky internet connections can be, especially in remote areas of the world or when you are in transit.
Firebase does have an offline option available for native apps, but not for JS and it has been promising to deliver one since 2013 and we are still waiting. You may be thinking, why don’t you just use persistent browser storage with Firebase to make it work offline? Good luck… if it were really that easy, Firebase would have added this option a long time ago. The reality is that implementing an offline-first design takes a lot of work and it shouldn’t have to be the app developer’s responsibility to handle this at the database layer. Moreover, Firebase is proprietary. That’s cool, but what happens if you want to move your data to another set of servers? What about the collaboration that comes with open source projects?
Enter PouchDB, an open source database that syncs. The PouchDB team should be commended for creating a great tool that goes a long way in the right direction. Unfortunately, as you dig deeper you’ll find that PouchDB is limited by the fact that it uses CouchDB as a backend. This means that initial replication can be very slow, i.e. it takes 60 seconds to load 1,000 docs. Could you imagine it taking this long for your app to start after the user has downloaded it? What if your app has 10,000 docs? The CouchDB team has recently introduced a bulk get function that helps, but the issue goes deeper in that the CouchDB replication design requires the client to download a complete history of changes to a doc. This means that if you modify a single doc 1,000 times then you need to download 1,000 items even though most of the time we only care about the latest change.
Another issue with CouchDB replication is that its conflict resolution policy is too darn strict. The result is that when two offline clients modify the same document and then go online, you’ll have a conflict that will either end up with one of the clients having all their changes ignored or you’ll have to force a change via an upsert. This puts a lot of responsibility on the app developer and makes it difficult to implement apps where the same doc is being modified simultaneously by two different users. For example, in a “to do” app, one user may be modifying the descriptions of the items while another user sorts these items.
Once you get past these PouchDB/CouchDB difficulties you’ll then realize that the permissions system doesn’t go far enough. A lot of modern apps implement a db-per-user design where a user registration generates a new DB. To implement this with CouchDB you’d need to create a backend routine. And once you dig even deeper and start considering that certain users should only have access to certain attributes in certain docs, you’ll have to do even more backend coding.
Another cool open source technology is Meteor and it does real-time syncing well, but it doesn’t work offline. Like the Firebase team, the Meteor team has been promising that they too will add offline-first capabilities to Meteor, but we’re still waiting. There are some workarounds like GroundDB, but they don’t go far enough. The bottom line is that to implement offline-first data syncing properly you need to build it into the core database layer and it cannot be an afterthought — it needs to be something that is considered at every layer of the database stack.
A Proposed Solution: DeltaDB
DeltaDB uses a simple last-write-wins conflict resolution policy. This results in conflict-free collaborative editing where the last user’s changes are the changes that are saved. Moreover, this resolution occurs at the document attribute layer which means that multiple clients can edit different pieces of the same doc without stale data overwriting fresh data. All writes are atomic at the attribute layer so you can also implement an all-or-nothing conflict resolution policy when needed. There are also constructs such as “auto restore” baked into the core that make it easy to handle situations like when an offline user modifies data that has since been deleted by another user.
DeltaDB is optimized for syncing. Clients push their deltas onto the server’s queue. The server processes the queue separately and partitions the data so that clients can retrieve all the recent changes very quickly. DeltaDB also archives the complete history of changes in case you need them. Due to the last-write-wins conflict resolution policy, the client only needs to download the latest set of changes during an initial sync, e.g. if a single doc is modified 1,000 times it is only downloaded once during the initial sync.
DeltaDB is framework agnostic and can be plugged into any JS app. It uses a pub/sub model that you can use to listen to real-time changes. These real-time changes are communicated via a web socket layer and to minimize network traffic, data is only sent when there is a change.
DeltaDB is incredibly scalable. Deltas can be segmented by UUID and the cost to add new nodes has a negligible impact on the cluster as handshaking between servers can be done as frequently as desired. Replication is master-to-master which makes DB clusters highly available as any node can go down without causing any downtime. Due to the last-write-wins policy, clients can switch to talk to any node, even if that node hasn’t yet received the latest deltas from another node. And, fault tolerance is implemented by requiring a quorum of servers to acknowledge a change before it is considered recorded.
DeltaDB is optimized for multiple CPU cores and is thread-safe so you can add multiple processes to speed up DB reads and writes. Moreover, it uses timestamps to update records so that transactions and their overhead can be avoided.
If you’re like me, you’ve been dreaming of the day when you can write an app in HTML5 that runs everywhere. Now there is hope.