Writing a Firebase Clone with PostgreSQL and Node.js (Part II)
In which we move closer to a useable Firebase clone
When we left off, we had just created some initial scripts for database triggers to get PostgreSQL to
NOTIFY a channel whenever a table is written to or deleted from. But we need a way to automate both the creation of these tables and to apply these triggers to all of them. To help with this, we will use an Object Relational Mapper to create these tables, and wrap the model creation in a way to apply our triggers.
Sequelize is a mature ORM for Node.js that supports many different relational database dialects, including PostgreSQL. We can use it to define models, which map to database schemas and constraints. For instance, these two examples, taken from the Sequelize documentation, create a `table` named project, and a table named `task` with columns mapping to the keys of the object passed as the second argument (and data types mapping to the values).
We can create our own entity that wraps this definition, and also give it metadata to add our triggers. Here, we pass in the name of our model, and the already created Sequelize
This class will become more vital later as we build out the rest of the project, and add methods and more properties.
We also will create a
Triggers class that will allow us to generate trigger command strings for each model, which we can then send directly to PostgreSQL:
We use ES6 template strings to populate these trigger creation commands with the names of our model, which we pass in through the constructor.
Eventually we will build a class that encapsulates the following behavior:
This is essentially a series of steps that we must take to create our models, sync them to tables in the database, and apply our triggers. If we have a collection of models, first we register those models with Sequelize through
sequelize.define. This actually creates the necessary tables, or associates our model with existing tables that reflect those models, once we
sync Sequelize. Then, once we are synced, we generate the trigger commands, and send them to PostgreSQL to be executed (we have to sync first because we cannot add the triggers until the tables actually exist). Once we have all of our models and triggers, we setup a listener that listens to the channel (“watchers”) that our triggers are publishing to (
LISTEN watchers). There is a decent amount missing from the code above, but lets diagram this out, as its only really necessary that we understand the process rather than the code itself.
We should note that the
LISTEN connection should exist separately from Sequelize, so that it is not part of the connection pool.
LISTEN will have its own dedicated client with a long-lived connection. We will also only have one
LISTEN connection, as opposed to a connection for each model. Remember, from the previous article, that the information emitted from the “watchers” channel will be a comma-delimited list of values, which will be enough to allow us to identify the relevant table and row. Once we parse that information, we can identify the appropriate model to query using Sequelize’s connection pool and ORM capabilities.
So we now have our database selected, an ORM, and a communication channel over which we can listen for updates. We have two more pieces of the puzzle to complete the backend of our Firebase clone: our WebSockets server, and our filtering mechanism. The filtering mechanism will be what decides whether or not to send newly created data to our WebSockets as it is emitted from the
LISTEN connection we have to PostgreSQL . This will depend heavily on our WebSockets implementation, so it makes the most sense for us to focus on WebSockets first and then work back.
Socket.io or WS?
When it comes to WebSockets in the Node.js ecosystem, we have two major mature choices:
socket.io . (I’m sure there are other variations, but these two are clearly the most popular. We could even write our own implementation from scratch on top of the
net module, but let’s save reinventing the wheel for a rainy day. Here is a link to the WebSocket protocol RFC if you’re so inclined.)
Many people like
socket.io (including me) but it’s client-side library and server-side library are too closely coupled for what we want to do. If we want to implement our own client-side data structures that behave like those of Firebase, we will want to build off of the standard browser WebSocket objects so that we have better control.
ws is much better suited for this, as it is un-opinionated about how we connect to it through the browser. We’ll have maximum flexibility.
Above is a very rudimentary overview of how
ws works. It’s very simple. Create the server,
wss, which creates a new WebSocket connection
ws whenever a browser connects. To send a message to that WebSocket, we call the
send method, and pass it a string.
It’s important to note at this point that unlike HTTP servers, WebSocket servers are inherently stateful. There is no way (at least that I have found so far) to serialize a WebSocket and manifest it on some other server instance. It is a long-term, two-way connection, and we must store it in memory. This means we will be limited by how many WebSockets we can maintain with only one server. However, WebSockets can be load-balanced with tools like Nginx quite easily. If we have multiple servers, the initial connection can be sent to one specific instance, and then that connection is maintained between that server instance and the browser in perpetuity (or until the connection is broken). In fact, because our
LISTEN/NOTIFY channel in Postgres will broadcast to all the server instances that are listening, we will be able to scale horizontally fairly easily, within reason. But, first we have to actually build our Firebase clone so that we can build a product that everyone wants to use, and then we can worry about scaling.
Back to the task at hand. We don’t want our WebSockets server to simply have one domain-based URI. We want to have “channels” like
socket.io. (I think that’s what they’re called, it’s been a while). To compare to HTTP, we essentially want path-based routing. That way, we can represent entity resources as endpoints, like we would in a REST API.
ws doesn’t have a router, but we can easily implement something like that with a simple dictionary.
Building incrementally, we now have a socket server with a collection of WebSocket connections mapped by url. We split out any query strings (so that this is not taken into account when later referencing our connections, and also so we can have some functionality related to querying later).
This installment’s getting a little long. In the next installment, we’ll start making our WebSocket server more robust, use it to hook up and encapsulate our ORM models, and (hopefully) tie PostgreSQL and our WebSocket server together through our filtering mechanism. By the end, we should have a class that we can simply pass Sequelize models to, which will set up all of our models, tables and endpoints, and can listen and interact with PostgreSQL on our behalf.