Making Your App Awesome When the Network Isn’t (Part 1)

A beginner’s guide to offline data storage and sync with PouchDB & Apache CouchDB™

--

Let’s get real. Networks are flaky, and that awesome web app you just built likely isn’t so impressive when you lose your connection. With an Offline First mindset and some simple client-side code, you can upgrade your masterpiece and deliver amazing user experiences in all network conditions. It’s easier than you might expect.

About this tutorial series

In this 2-part, beginner-friendly tutorial series, I’ll walk you through the steps I took to build my first offline-capable Progressive Web App using only client-side code:

  1. Enabling offline data storage, sync, and sharing with PouchDB and Apache CouchDB™
  2. Ensuring quick page loads — offline or not — with a service worker

You can explore the code on GitHub, where you’ll find instructions for running the app yourself if you’d like, or feel free to stick to the basics right here.

My introduction to Offline First

As a meeting-planner-turned-web-developer, one of my favorite roles here on the Developer Advocacy team at IBM Watson and Cloud Platform is my organizing work with the Offline First community. Even before I took up front-end coding, I had the privilege of co-organizing Offline Camp, a retreat-style unconference where I’ve participated in some fascinating discussions about how web developers, UX professionals, and business leaders can democratize access to all the riches of the internet.

An Offline Camp discussion about — you guessed it! — CouchDB & PouchDB for PWAs (Image credit: Teri Chadbourne)

From healthcare solutions in the developing world to entertainment for the daily commute, the Offline First approach to web development is transforming user experience by planning for the most constrained network environment first, keeping apps running even while offline and providing progressive enhancement as network conditions improve. Conversations at Offline Camp run the gamut from inspirational use cases to the technical nitty-gritty of particular solutions, but there’s a common thread: it’s time to stop treating shoddy connections as an error condition and start building with real-world network constraints in mind.

Building my own Offline First project management tool

After each edition of Offline Camp, we encourage campers to write up summaries of our unconference sessions for our Medium publication. Originally, our editorial team used Trello to track the status of blog posts in the works, but there was a lot of manual work involved to fill the gaps where the tool didn’t quite meet our needs. As part of a class assignment for a JavaScript course last fall, I decided to take a stab at building my own offline-capable project management tool that would be better customized for the needs of our editorial process. As someone who’s only familiar with front-end development, I of course built this in the form of a web app.

The homepage of my project management tool, showing all current entries.

Using a simple web form, which uses logic to hide and reveal certain questions depending on the data entered, my app stores a record for each article in progress and lets me come back and edit that record later. It also creates a second webpage I can share with an author to provide resources they need and request resources I need in return.

The data entered by the editor in the web form affects what’s displayed on the author’s page.

Because I need to start this process while I’m on site at Offline Camp with limited internet access, the app needs to load while offline and allow me to edit and save data without an internet connection as I assign authors to posts. Since I collaborate with other editors and own many gadgets, the data ultimately needs to sync across multiple devices, browsers, and users. It requires an Offline First design.

In this article, I’ll walk you through how I implemented offline data storage, syncing, and sharing using PouchDB and CouchDB. In the next post, I’ll address the challenge of keeping the page loaded with a service worker.

I’ll include the most relevant bits of code here on Medium, but you can also explore my code, as well as clone and run the app yourself from my GitHub repo.

Enabling offline data storage, sync, and sharing

Like every useful web form, our project management tool needs a place to store the data it collects. If we store that data exclusively in a remote cloud database, it will only be accessible when we have a network connection. If we store it only locally on the device, we can’t have multiple users or access our data across multiple devices. In order to create a shared tool that works offline, we need to combine both a local and a remote database and keep them synced to whatever extent our connection allows. This process becomes pretty easy when using tools built for the job: PouchDB and Apache CouchDB™.

Creating a local PouchDB database

For client-side storage, I’ve chosen to use PouchDB, an in-browser database inspired by Apache CouchDB™. As PouchDB works across multiple browsers, it checks to see what kind of local storage is supported by each browser and adapts accordingly, using IndexedDB where it’s supported, or an older solution such as localStorage where needed. PouchDB is open source, and JavaScript is the only language you need to know to make it work. It’s important to note that PouchDB’s API is asynchronous, so you’ll use promises (or callbacks) to catch and deal with any potential errors without slowing down the user experience.

Since my browser supports it, PouchDB has automatically selected the IndexedDB adapter.

We run PouchDB by including a provided JavaScript file for the project and referencing a script tag in our HTML, just as we would use a library like jQuery.

Creating a local database is easy. In our main JavaScript file, we simply call new PouchDB and give the new database a name. This database lives in the browser on our local device.

If I were going to use my app from the same browser at all times and no one else needed to access my data, I could probably skate by with only using PouchDB for storage. But I’d be missing out on the real superpower of PouchDB and CouchDB: sync.

Creating a remote CouchDB database

Because I work with other editors who also need to know the status of the articles in progress, it’s important that we share data about our project in a place where it’s accessible to everyone. Enter CouchDB.

Apache CouchDB™ is an open source NoSQL database that stores documents in JSON format. Its unique replication protocol, which is shared by PouchDB, is particularly well suited to creating offline-capable mobile applications. In fact, its approach to offline sync is one of the key features that sets it apart from fellow NoSQL document stores such as MongoDB.

In my case, since I don’t yet know how to host a CouchDB database by myself, I chose to use IBM Cloudant, which is a fully-managed database-as-a-service based on CouchDB. (Disclaimer: I work for IBM.) If you have the dev ops skills, you could just as easily use the open source CouchDB directly. (For more on how to set up your remote database — including enabling CORS — see the setup instructions in my project repo.)

In a credentials.js file I wrote a single line of code defining the URL used to access my remote CouchDB database:

Note that when loading JavaScript files from our HTML, it’s important to load the credential.js file, where this variable is created, before the project-manager.js file, where it’s used.

Syncing data between local and remote databases

With both the local PouchDB database (db) and the remote CouchDB database (remoteCouch) defined, it’s time to make them talk to each other.

When the page is first loaded, we check whether the remoteCouch variable is defined.

If so, we run the sync() function to initialize a continuous sync between PouchDB and CouchDB.

The end of this function is where the syncing magic happens. Calling the sync function on my PouchDB database (db), we pass in three parameters:

  • remoteCouch is the remote CouchDB database I want to sync to, established earlier.
  • opts represents the options set in the previous line, where I used {live: true} to tell PouchDB to sync continuously rather than on demand. Best practice would be to also use {retry: true}, which will force the function to run again in case of a connection error that stops replication.
  • syncError() is a function that will run if there is an error in the sync process (more on this shortly).

This code also sets a change listener, which will be triggered when any data changes in our local PouchDB database. (Because we set up continuous syncing, listening for changes in PouchDB should pick up all changes, whether they’re made locally or replicated in from CouchDB.) When the data changes, the function updateArticles() is run to refresh the user interface, displaying an updated list of articles. (More on this to come.)

As far as syncing goes, that’s the extent of our code. Easy peasy!

If you’ve used CouchDB or Cloudant before but not PouchDB, it may appear that we’ve missed an important step in the process: making AJAX calls to the remote database to get, put, and all that jazz. One of the many joys of PouchDB is that it keeps us from having to learn anything about API calls. That PouchDB script I included is handling all of that in the background.

Alerting the user to sync status

A key tenet of the Offline First approach is that we never treat a lack of network connection as an error condition. Instead, it’s an expected state that we may or may not need to flag to our users, depending on the circumstances.

In the case of this tool, the user is able to keep working — including writing and editing data — while offline. What they lose the ability to do is push their edits to the shared CouchDB database or see edits made there by others since they went offline. So a simple heads up should suffice, letting the user know that their data is safe on the device but not currently syncing to the cloud.

To accomplish this, I built a div into my HTML whose color and message change based on whether PouchDB is successfully syncing to CouchDB.

It turns green when sync is successful, powered by the sync() function we’ve already seen. The syncError() function, which is called by PouchDB when the sync function fails, uses jQuery to make this indicator turn red:

Writing data to PouchDB

All of that syncing magic would admittedly be more impressive if we had some data to sync. The PouchDB website offers a nice introductory guide to getting data into and out of the local database, but I’ll walk you through the code in my own application to introduce you to the concepts. (If the extra code in my application confuses you, go check out that guide for a simpler example.)

Whenever we use a web form, we need a way to save the data entered there for future use. Unlike a model where we might need to send data away to a remote server for processing, here we simply need to save all of the data as a document in PouchDB. (If you’ve ever used IndexedDB or localStorage, you’ve done something similar.)

Creating a JSON document

As a NoSQL database, PouchDB (like CouchDB) stores unstructured documents as opposed to specifying a schema with rows, tables, and such. You’ll often hear the phrase document store used to describe an unstructured NoSQL database.

Since data transmitted between a browser and a server has to take the form of text, the document is stored as JSON (JavaScript Object Notation). JSON is a text format that uses the syntax of a JavaScript object, although it can actually be read by any programming language. Like a JavaScript object, a JSON document consists of a collection of name/value pairs, although it notably can’t contain methods. It can contain other objects or arrays if needed, or can even itself be an array of other objects. So these “unstructured” records do have structure — it’s just allowed to vary from document to document.

An adorable JSON sample from the PouchDB guide

As mentioned earlier, PouchDB will typically use IndexedDB (if the browser supports it) to store data. However, as with CouchDB, we don’t need to learn any special syntax for talking to IndexedDB. PouchDB is taking care of all of that for us in the background so we can use a single set of functions to deal with both our local and remote storage. (Since building this app, I’ve used IndexedDB directly for another project, and I missed the simplicity of PouchDB’s API.)

The web form used by the editor, from which we save values to PouchDB

When a user clicks save from an editing screen, I do some necessary form validation and then create a new article object in which I can store the current form values. (I’ve chosen to do this with jQuery, but it could also be done with plain old JavaScript.) Because it makes later steps much easier, I ensure that the names of fields in my object exactly match the names of fields in my form. (My form has a rather overwhelming number of fields, so I’ve simplified the object in this snippet.)

Most of the name/value pairs in the object are what you might expect: author name, title, true or false status of a completed checkbox, etc. However, there are two special properties that are essential to the PouchDB and CouchDB model: _id and _rev. These two values are critical for helping the database identify the records you’re looking to match when you attempt to access, update, or delete a record.

Assigning unique IDs (_id)

Every document in your database must have an _id property which is both unique and unchanging. That means assigning a unique ID when the record is new and keeping that ID consistent when the record is edited.

There are two options for writing new documents to PouchDB: db.put() and db.post(). With db.put(), you create your own ID and pass it in. With db.post(), you do not pass in an ID, and one is auto-generated on your behalf instead. In his blog post on best practices, Nolan Lawson (a former PouchDB maintainer) recommends always using db.put() for three reasons:

  • As we’ll later see, when you retrieve all documents using allDocs(), they come back sorted by ID, so your choice here determines your primary index. It would be a waste to use random values instead of passing in datestamps as IDs so the records can easily be sorted chronologically without necessitating a secondary index.
  • You’re going to have to learn the db.put() API to edit or delete records, where you have to pass in the ID to ensure the correct document gets updated. You may as well just learn one API.
  • You’re wasting space storing gobbledygook IDs with db.post().

I’ve followed the advice to use db.put() with a datestamp as my ID, and the navigation scheme of my application makes it so I can use the location.hash in the URL to determine whether I’m dealing with a new or existing record.

Each time I write the list of existing articles to the page, I include in each a clickable link which contains the existing _id embedded in the URL as a hash. (The screen for editing a new record contains the hash#new instead.)

The document ID becomes a hash of the URL used to access the editing screen.

The form used to edit an existing record also includes an info section which stores the _id and _rev for safekeeping in uneditable form fields, along with as a custom link to a page for the author of an article to view, again using the article ID as a hash in its URL.

When I write the article’s data to the page, I protect the _ID and _REV fields from editing.

This system allows me to use location.hash to determine whether the record is old or new before saving the current status of the form to PouchDB. In the case of an existing record, the pre-existing ID is used. For a new record, a unique ID is created for the first time using the browser’s ability to check the current date and time and save it as a string using: var ID = new Date().toISOString();

Managing revision markers (_rev)

Unlike the unique identifier _id, which must be present and consistent every time a document is written to PouchDB, the revision identifier _rev exists only upon revisions and is created by PouchDB directly. On first write, your document must have no _rev field. When you retrieve the document to edit it, there’s a _rev field created by PouchDB which you must remember and submit with your revision. The next time you access that record, you’ll see that PouchDB has created a new value for _rev.

Although this process is a bit confusing for a beginner, it’s the key to making sync work. As explained on the PouchDB website, the document revision structure used by CouchDB and PouchDB is actually quite similar to Git’s. The revision history, stored as a tree, allows management of conflicts when databases get out of sync. For our purposes, though, all we need to do is capture the new _rev given to us each time we read from PouchDB and write it back in identical format when we update the record.

Before actually saving the article to PouchDB, our saveArticle() function does extra work to ensure that the _id and _rev fields are present in (or absent from) the article object in exactly the right format:

Once we have those special fields figured out, writing a record to PouchDB is quite easy. After building the article object that contains all the values collected in the form, as well as the proper _id and _rev values, we simply use db.put(article) to save the record. The continuous syncing that we’ve enabled means PouchDB will immediately sync with CouchDB if a network connection is currently established, making the new data immediately available to other users or devices. The change listener will also initiate the updateArticles() function to redraw the article list on the app’s homepage, even if there’s no connection to the remote database.

For a closer look at how these pieces of code all work together, check out the full saveArticle() function in my repo. There you’ll see that I’ve also used an unobtrusive “toast” notification to let the user know the record was saved before taking them out of the editing screen and back to the article list, which will include the new or updated record that was just submitted.

Retrieving data from PouchDB

When it’s time to retrieve data from PouchDB, the way we plan to use the data affects how we can best access it. To display a list of articles, for example, I need to do a batch fetch and loop through the entire database to find the title and author of each article and write them to the DOM, which I’ll do using db.allDocs(). In order to pull up the details of a specific article for editing, however, I need a way to fetch a specific document by referencing its _id. This is accomplished using db.get().

Fetching all documents with db.allDocs()

The command for batch fetching documents from PouchDB looks like this:

Batch fetch with db.allDocs() (Source: PouchDB API — Batch Fetch)

By default, multiple documents returned will be indexed and sorted by their _id, deleted documents will be excluded, and the valuable content of the documents themselves (all those details created in our web form) won’t be provided.

By setting the option include_docs: true, I can ensure that all those details will be delivered as part of a field called doc in the results. Otherwise by default I would get only the _id and _rev properties.

To sort the documents by most recent first, instead of oldest first, I set descending: true in the options when calling the function.

The object returned by the allDocs() function, seen as doc in the snippet above, has a structure that’s a bit confusing to parse. Alongside properties which contain metadata we don’t need (such as the number of documents returned), it includes the property rows, whose value is an array of objects. Each of these objects (row[i]) includes more extra metadata, plus a property doc that’s what we’re actually looking for. So doc.row[i].doc represents one of the articles we saved to the database.

The data structure returned with a batch fetch (redacted)

As mentioned earlier, every time there’s a change in the PouchDB database, the updateArticles() function is called to redraw the article list in the user interface. It passes only the array of actual documents (doc.rows), not the top layer of extra metadata, into our redrawArticleList() function as the array articles. We loop through this array of objects, referring to each object as currentArticle as it passes through the loop and is written to the page.

In this function, currentArticle.doc (equivalent to doc.row[i].doc described above) refers to the object we wrote to PouchDB when saving the form. Each li, created by looping through the array, includes the article’s title (currentArticle.doc.title), author if it's known (currentArticle.doc.author), a link directing to a special URL that includes the article’s _id, and classes created based on the editors and completion status.

For our needs here, a full list of articles sorted by ID is sufficient, so we’re not doing additional filtering via PouchDB, though that would be possible. Through the Mango query API (also known as pouchdb-find), we could create secondary indexes beyond the built-in allDocs() and changes()indexes. (If you’re familiar with MongoDB but not CouchDB or PouchDB, I’m told you’ll find Mango very easy to pick up.) Map/reduce queries are also possible, but they’re harder to implement and you’re quite unlikely to need them, since Mango meets about 99% of users’ querying needs.

If you run the app, you’ll see that I do offer some limited filtering of the article list. However, I’ve done it by adding classes for editors and completion status to each li as I build the DOM in my redrawArticleList() function. A separate click handler monitors what filtering option the user clicks on and triggers jQuery's show() or hide() functions accordingly on lis of the appropriate class to hide and reveal articles as needed. All of the data retrieved from PouchDB is still there, just made invisible to the user.

The article list filtered to show only completed blog posts.

Fetching individual documents with db.get()

When we want to view or edit the details of a specific article, we need a way to fetch just that single record from PouchDB. (Remember, since we have continual sync turned on, this is the functional equivalent of accessing the most recent possible CouchDB record, as of the last time we had internet access.) This is where db.get() comes in.

To fetch a specific document, you need to know the value of its _id, which is a required argument of the db.get() function.

Fetching a specific document with db.get() (Source: PouchDB API — Fetch)

Earlier we saw that when I write my list of articles to the DOM, I include a link to a URL containing the document ID. When a user clicks on one of these links to open a record, my populateForm() function first looks for that location.hash value, removes the preceding #, and stores the result as the variable ID. I can then pass that variable into the db.get() function to access the record the user wants to see and write its details to the page.

When we use db.get() , we don’t end up with all the extra metadata we saw with db.allDocs(). Instead, the results are identical to that article variable we once stored in Pouch, but with a new _rev value. Remember how I carefully set the names in the object to match the names of the form fields in my HTML? Because of that, I can now loop through name/value pairs in the object, using the value to identify what form field it aligns with and then setting the value in the object as the form value. You’ll note that I have to use some logic to determine whether the form field is a checkbox or a text field before I can appropriately set the value.

Since I need to ensure I don’t lose access to the _id and _rev fields, I have places to store those in the DOM as well, although they are in a portion of the page that’s not editable by the user.

I also use the _id retrieved from PouchDB to create a customized, clickable link to the second webpage of my app, writer.html, which shows and hides certain sections or details based on the information in that article record. The details on this page will change as the record is updated with new resources and status markers, so the author can always use the same URL to access the most current information. (You can explore my writer.html file or the relevant sections of my project-manager.js file to see how this works.)

The page shared with the author

Deleting a record in PouchDB

Occasionally, an editor will need to delete a record from our database. There are a few different ways to delete a document from PouchDB, but I chose to use db.remove():

Deleting records with db.remove() (Source: PouchDB API Documentation — Delete a Document)

Since my delete button is on my form page, I can use the form values to get the _id and _rev values that are needed here. (Both fields are required for deletions, as for edits, so that records can be properly synced.) I also make the user confirm they really want to delete the record before allowing the transaction to continue, and provide toast notifications to let them know the procedure has succeeded or failed. When it’s succeeded, they’re returned to the home screen, where the article just deleted will no longer be visible in the UI.

The other way to delete a document in PouchDB is to use db.put() and add a _deleted: true value to the existing fields. (Compare this to db.remove(), which adds the same field behind the scenes, but deletes all the other fields.) That method is required if you use filtered replication, so that you don’t delete the fields you’re using to run your filter and thereby lose your ability to replicate the deletion to CouchDB.

Deleting records with db.put() (Source: PouchDB Guide — Deleting Documents)

With either method, the database saves a tombstone at the end of the revision tree, and you shouldn’t count on being able to access the deleted document.

Adding a tombstone to the revision tree. (Source: PouchDB Guide — Deleting Documents)

Speedy sync across multiple users and devices

In order to implement security measures to make my index.html page available only to editors, make each writer.html#articleID page available only to the appropriate authors, and fully protect my CouchDB credentials from all users, I would need to implement some server-side code. (I’m curious to check out Hoodie, which lets you build a backend for a web app with front-end code.)

However, we can see that by using the built-in syncing superpowers of PouchDB and CouchDB, we’ve already made it possible to:

  • Store and edit data locally while offline
  • Sync data to the cloud when a network connection is available
  • Share data from CouchDB across multiple users or devices, each with their own local PouchDB database

Remember, because PouchDB lives in the browser:

  • Two tabs in the same browser (not in incognito mode) share a single PouchDB instance
  • Different browsers on the same device have separate PouchDB instances
  • A browser tab in incognito mode has its own PouchDB instance separate from the same browser running in standard mode
  • Different users on different devices have separate PouchDB instances

All of this makes it easy to test syncing functionality from a single device. You can load up a tab in Chrome to represent one user (or device) and one in Firefox to represent another, then toggle internet access on and off for one “user” but not the other.

The developer tools in both Chrome and Firefox offer ways to simulate being offline:

You can simulate offline status from the Network panel in Chrome dev tools…
…or from the Developer panel in Firefox

As you experiment with this, you’ll see that one of the incredible powers of this syncing process is its speed. When both devices are connected, data is transferred from one to another in the blink of an eye. In fact, many people choose an Offline First architecture not just to deal with lack of connectivity, but also to improve performance on even the strongest networks. We’ve seen the speed of data sync here, and in my next post we’ll see how a service worker can be used to increase the speed of page load.

Summary & resources

Remarkably, handling data storage and sync is one of the topics I see least commonly addressed in tutorials on building Progressive Web Apps. PouchDB and CouchDB certainly aren’t the only solution to the problem of offline data sync, but both tools were built with an Offline First use case in mind, making them surprisingly easy to implement.

Many of the folks I’ve met at Offline Camp are big proponents of PouchDB and CouchDB as part of a stack for Progressive Web Apps, and have shared their stories in the Offline Camp Medium publication. Check out this article collection for more coverage.

My team at IBM has also put together a collection of sample implementations using PouchDB and CouchDB along with a variety of popular libraries and frameworks to create offline-capable shopping list apps that range from PWAs to hybrid mobile apps, native mobile apps, and even desktop apps.

For a broader perspective on Offline First, I recommend this list of resources as a great starting point to finding educational materials or joining the community.

Up next: caching

Offline data storage is awesome, but it’s no good if you can’t load the page that lets you view and edit your data. In my next post, I’ll walk you through using a service worker to cache resources on first page load, enabling speedy loading on subsequent visits, offline or on.

--

--

Teri Chadbourne
Center for Open Source Data and AI Technologies

Web developer | Building the dweb community as lead maintainer of @ProtoSchool at @ProtocolLabs | @OfflineCamp co-organizer & #OfflineFirst advocate | she/her