CouchDB, PouchDB and Hoodie as a Stack for Progressive Web Apps

I recently had the privilege to attend Offline Camp, a 4-day retreat that brings together a select group of people to discuss offline-first development and design. One of the unconference sessions I participated in was CouchDB, PouchDB and Hoodie as a Stack for Progressive Web Apps. In this session a small but passionate group of us took turns sharing our views on this offline stack and asking questions like, why isn’t this stack more popular? The following is a write-up of what we discussed with my added two cents.

Let’s start with some background. An offline-first design enables an app that is mostly, if not completely, usable when offline, but which can then sync with the cloud when online. Most popular mobile apps, e.g. Google Maps, Facebook, and email implement at least some basic form of an offline mode, so it is easy for everyone to understand the importance of having at least some offline support. Going a step further and implementing an offline-first design takes work, but most people can agree that an offline-first design greatly improves the user experience because it reduces latency for users by keeping the majority of data changes local. And, in many cases where an Internet connection is not available or is unreliable, it is an absolute necessity to be offline-first. The next question is, what stack can you use to actually create an offline-first app?

The CouchDB, PouchDB and Hoodie stack is one of the most battle-tested open-source stacks for developing offline-first apps. Apache CouchDB™, which effectively started the NoSQL movement in 2005, supports multi-master syncing and can scale to support hundreds of nodes. CouchDB’s superpower is sync and it has a reliable way of handling data in both offline and online environments. Essentially, when conflicts occur, nothing is deleted and it is up to the developer/user to resolve the conflicts. PouchDB is a scaled-down JavaScript implementation of CouchDB that runs in the browser and can also sync with CouchDB. In essence, your app writes data to its local PouchDB instance and then PouchDB syncs with your CouchDB cluster when you go online. Hoodie, the last piece in this stack, adds a useful authentication and connection API.

As for companies offering CouchDB as a service, there is really only one option and it is a good one. Cloudant, which was acquired by IBM in 2014, has both free and paid tiers and will take care of scaling your cluster as your demand grows. Cloudant has a long history of CouchDB expertise. In fact, the majority of the CouchDB clustering code came from Cloudant and Cloudant continues to make contributions to the CouchDB codebase.

As CouchDB is open source, you can also run it yourself. I actually wrote an article a while back on Running a CouchDB 2 Cluster in Production on AWS with Docker.

Source: http://hood.ie

After we finished introducing this great offline stack, we started pondering, why isn’t this stack more popular? One of the first issues we mentioned was the lack of any big fish who are using this stack. The Wikipedia page for CouchDB mentions several well known entities using CouchDB, including npm and BBC, but it appears that these entities are just using the CouchDB piece of the stack. PouchDB has a list of companies using PouchDB on its website, but this list doesn’t really contain any big players. Some of us feel that this is probably due to the fact that this stack is still relatively unknown even though CouchDB was initially created in 2005 and PouchDB in 2012.

We also feel that the lack of popularity could be due to the lack of good documentation — documentation that takes you step by step through the process of creating an app with this stack. The Cloudant representatives at Offline Camp mentioned that they have been working on a series of new tutorials and we all agreed that more documentation is needed, including introductory level material.

In my opinion, another significant reason for the lack of widespread use is probably due to the fact that it wasn’t until just recently that CouchDB became a database with native support for clustering. Before this, you could cluster CouchDB nodes, but you had to do a lot of the work yourself. Now that CouchDB 2 natively supports clustering, you can easily scale your data layer to meet the demand of many users. I believe that it is only a matter of time before more companies start using CouchDB in their systems and this should create a positive reinforcement loop, which will lead to more success stories.

Another challenge with the stack is that it requires a developer to learn different database paradigms. For example, in a NoSQL design, data is denormalized and therefore duplicated throughout the database. This denormalization is a benefit as it allows for faster querying, even when the dataset is very large. However, this design can feel uncomfortable to many developers who would rather just join their datasets, like is done with a relational (aka SQL) database. The problem with relational databases is that they don’t scale well and they don’t support a reliable offline-first design like CouchDB does.

Source: http://blog.agroknow.com

Another unique CouchDB paradigm is CouchDB’s conflict resolution strategy, which effectively pushes the task of resolving conflicts to the developer. This is actually a good thing, but it requires the developer to think carefully about how to resolve these conflicts. Other distributed databases, like Firebase, use a last-write-wins conflict resolution strategy that is easy to use as conflicts are automatically resolved. The trade-off is that last-write-wins can lead to data loss when the same database record is modified simultaneously, especially when this data is modified offline. Most of the time this data loss is not acceptable.

Another challenge with storing data in CouchDB, and something you’ll encounter with most other NoSQL databases, is managing the overhead it takes to support a database-per-user structure. Using db-per-user you can concentrate all the docs that a given user needs in a single database, which then allows you to optimize the syncing with your app. Moreover, it allows you to define specific roles so that only the owner of the data has access to her/his database as CouchDB’s access control system only limits access at the database layer. The difficulty with this db-per-user approach is that it often requires you to replicate shared data between different databases and CouchDB’s replicator construct is usually too resource intensive for this task. Instead, there is a new technology called Spiegel, which allows you to implement scalable replication so that you can efficiently sync just the databases that are changing. (I happen to be the author of Spiegel :D). And, on the access control front, the CouchDB team has begun work on adding doc-level access, which will effectively reduce the amount of databases you need when using CouchDB.

In the end, every database technology has its trade-offs and there can be a steep learning curve when learning about how to properly support horizontal scaling and an offline-first design, yet these details are necessary for most modern apps. The good news is that with the release of CouchDB 2, large-scale clustering is now something that is easier to implement. Moreover, the open source community around this stack is growing quickly and tools are being created everyday to address some of the biggest pain points.

The latest CouchDB, PouchDB and Hoodie stack is ready for prime time and is one of the best choices for implementing an offline-first design. This offline-first design has been baked into every layer of this stack and the simplicity of the conflict resolution policy is really quite remarkable.

If you are itching to take a closer look at this stack, I recommend that you visit PouchDB’s Getting Started Tutorial. Happy offline-first coding!

About the Author

Geoff Cox is the Co-Founder of Quizster, a photo-based submission and feedback system. Quizster uses a full stack of JS and runs CouchDB and PouchDB at the data layer.