Writing a Firebase Clone with PostgreSQL and Node.js (Part I)

I have been working on my own Firebase-like framework for Node.js called `wrest` for a little under a month, and felt I should share the techniques and processes I used a long the way.

7 min readMay 26, 2017

(Part II Here!)

The first time I encountered Firebase was in a YouTube presentation, given by a “developer advocate” at some conference (possibly an ngConf from a few years ago). He showed how quickly he could sync applications in two different browser contexts by simply hooking Firebase entities into his Angular app, with the interfaces updating as the data changed without any developer intervention. Very powerful, very interesting, very cool. I tried it out myself, and continued to be impressed with it’s capabilities. However, I have always had one major issue with Firebase: it’s a backend-as-a-service product. At the end of the day, Firebase has much more control over you than you have over it (this is not the kind of inversion of control you’re striving for). Why not have something that you can use and deploy yourself, and extend as you see fit?

What Firebase Does

Firebase primarily operates over WebSocket connections (though it does have HTTP fallbacks for long-polling when WebSockets are unavailable). WebSockets are long-lived TCP connections between the browser or native application and the server (if you are unfamiliar with WebSockets, I will refer you to MDN). When you send data to FireBase (or delete data or update data), it emits those changes to other instances of the application that are listening for updates via their own WebSocket connections, which keeps all the applications in sync without the need to constantly poll REST endpoints. As a consumer of Firebase, all you really do is use their client-side data structure library, which takes care of connecting to Firebase and maintaining the synchronized data itself.

What We Need

Firebase is a backend-as-a-service that stores data and synchronizes client side data structures via WebSockets. So… what components do we need to assemble our own?

Front End

Firebase relies heavily on its own client-side libraries which abstract away much of the complexity of dealing with Firebase directly (in fact, I have no idea how to setup a plain WebSocket connection to Firebase and no real desire to look at the inner workings of their client-side libraries right now). In order to get the same kind of developer experience as Firebase, we will need to implement our own versions of these data structures at some point, most likely in a follow-up article (LINK WILL GO HERE ONCE ITS WRITTEN). I actually have a partial implementation in place for a self-updating array via a WebSocket connection, but let’s focus on the more important part right now: the back end.

Back End

The are two important aspects of our back end to consider: a WebSocket server for the client to communicate with, and a data store. The WebSocket layer will be for exposing channels for the client to listen and publish to, and also for coordinating data flow back to the client as our data store is updated. But we want our data store to handle the task of updating the WebSocket server when its updated, rather than abstracting that job into its own server component. This is a design decision that will help with scalability (if it ever becomes necessary).

Pub-Sub All The Way Down (..or up?)

In the frontend development world, unidirectional data flow has emerged as the winning state architecture for rich single-page applications. This is illustrated by the popularity of Flux architecture and its popular Redux implementation, as depicted below.

Flux Architecture, Source: Kapost Engineering

We are attempting to implement a similar architecture regarding the flow of data in the back end, but instead of being contained within a single, client-side application, one stream of data will potentially affect the state of all running instances of the application:

An illustration of how our implementation will operate

As updated data flows into our WebSockets, we will insert it into our Database, and then send out the updates to all of the necessary WebSockets depending on which resources they are subscribed to. This implies that we need a database that can notify our WebSocket server when it has been updated. Are there any databases on the market that satisfy this condition?

Redis

Redis has pub/sub faculties, it’s often even used as a message queue. However, it’s not really a fully featured database. I’ve seen people use it as their primary store, but it’s not really what it was designed for. If worse comes to worst, we could possibly make use of it as an intermediary message queue in between our primary store and the WebSocket server, but I think we may have more suitable options.

MongoDB

MongoDB can be jury-rigged to provide pub/sub capabilities with capped collections and tailing cursors (an approach I only half understand right now).

Look. I love MongoDB; it used to be my “go-to” data store. But NoSQL is suitable for only certain use-cases. Most consumer-facing applications will have data models that are inherently relational, so we should look for something that provides that capability.

MySQL

MySQL is one of the most robust open-source RDBM products available and is use by many high-profile applications. That said, it offers no native publish/subscribe functionality. We could probably make something that looks for changes in the MySQL write-ahead log (WAL), but that would be a bit tedious.

SQL Server or Oracle

….No.

Actually, you can implement this functionality in either of these options, though it can get very complicated (see this StackOverflow question about implementing in Oracle). Also, we don’t want to have to worry about 💸💸💸 when it’s so simple with our actual choice. Which will be everybody’s favorite database…

PostgreSQL

PostgreSQL has NOTIFY and LISTEN commands, which publish and subscribe to channels, respectively. We can use a combination of triggers and plpgsql procedures to accomplish what we’ve set out to do.

Triggers

Triggers are evil and should be avoided at all costs, at least according to popular opinion. But triggers can give us exactly the functionality that we’re looking for if we use them correctly.

Triggers are a feature of most modern RDBMSs, and PostgreSQL is no exception. They are a form of stored procedure that’s execution is triggered by a specific event. There are a few different types of events that can trigger these procedures, but we will simplify our discussion as not to get bogged down in details. But I like to think of procedures as having four aspects:

Temporality. Do we want the trigger to fire before or after the intended event?
Activity. What event are we targeting (insertion, deletion, update)?
Procedure. What kind of action do we want to happen when we trigger the trigger?
Specificity. Should the trigger act on the statement (once) or on the affected rows (once per affected row)?

Triggers are attached to the database via standard SQL, but first we need to write our procedures. These are written in plpgsql, or Procedural Language for PostgreSQL. In fact, there are various procedural languages that can be used with PostgreSQL, but plpgsql is the only one you don’t need to install. (For an interesting alternative, check out the JavaScript implementation PL/v8).

Here’s an example of what our procedure will look like:

The double dollar sign essentially acts as single quotes, treating the enclosed text as a string, with the language tag at the end indicating that this is a plpgsql procedure. The procedure itself is really this:

It follows standard transactional syntax, but we are performing a special, built-in procedure called pg_notify(). This is the bread and butter of our project. By invoking our formulation of pg_notify, we are publishing a message on the channel watchers. This message is simply a string, the comma delimited concatenation of the table name (TG_TABLE_NAME), the id of the newly inserted record (NEW.id), and the command that was performed ( INSERT) . We will need three variations of this procedure, one each for INSERT, UPDATE, and DELETE. (*TG_TABLE_NAME and NEW are special variables that are derived from the function’s execution context)

Now that we have our procedure (specified as notify_trigger()) we can create the actual trigger:

First we get rid of the trigger if it already exists. Then we create a trigger called watched_table_trigger that executes AFTER the INSERT command on the specified table (temporality, activity). FOR EACH row that was affected (specificity), our previously defined notify_trigger() is executed (procedure).

What Next?

We don’t want to have to make these triggers manually. That would be a terrible tragedy, and an unlikely path for most developers. In the next part of this series, we will step back and look at using an ORM to help with defining and generating entities, tables, and procedures for our project. Soon, we will have our own backend framework that gives us much the same functionality as Firebase itself.