“production-only”/”mono-system” development — my Recurse Center project.
I’ve had a lot of trouble describing what I’m working on at RC. Usually in response to the question “What are you working on?” I’ve mumbled incoherently something like “weeell… it’s about deployment from git” or “weeell… it’s about immutable data and code” or “weeell… it’s about versioning data”. The truth is that it’s an attempt to integrate many of these things (and more!) together.
For awhile now I’ve been thinking about how I could simplify the development and deployment process to lower barriers to making changes to software. The software I’ve written, which I’ve called lazy cloud, is an attempt to rethink this process.
The best description for what I’ve been working on so far is “production-only development”. What I mean is all developers working on a project make changes to a single system or environment and changes are available as soon as the changes are commited. The implications of this include the need to quickly deploy changes as they are made, many parts of the typical dev/test/staging/CI cycle might need to be rethought and any external system that interacts with the code would need to be modified to work with it. In particular any system that deals with application state like databases or file systems would need to segregate data using versioning or some other method.
Checking the full feasibility of this idea would be impossible in the short time I have at RC. I’ve built a demo that focuses on building web sites/web services and versioning in a database that shows some of what might be possible.
At a high level the demo is composed of two components: a web server that plays the role of both a proxy server and a deployment server, and a database that versions data. Git plays a big part in both, most obviously to version your code, but also to deploy a specific version of the code in response to a request and to version the data in the database. The demo was created using node, typescript, express and pm2 to manage processes.
The rest of this post will be a basic description of the design of lazy cloud. To get a better idea of how this works in practice, I’ve created a video that shows making a change, deploying it and how data is segregated:
The web server performs two roles, the first is to proxy requests to processes running the git version of the code requested or to proxy to the production commit id if there is no requested git version. Specific commits are requested as part of the subdomain in the host field of the http request i.e. in response to an http request to http://3a78c353003070286fd8c217d40df1e2b9a9eecd.example.com, the proxy server will strip the commit id 3a78c353003070286fd8c217d40df1e2b9a9eecd off the subdomain and check to see if a process is running that commit id. If the commit is running then the proxy will simply forward on the request, otherwise it will start a deploy of that commit.
To deploy a commit the proxy server first responds with a landing page that initiates a web socket connection back to the proxy server to start the deploy. The actual deploy involves several steps like cloning or updating the code, installing dependencies, running pre and post deploy scripts and finally starting the process. The web socket connection will respond with progress and error messages during the deploy process, before finally reloading the page which will load the page from the process that was just started.
When the proxy server starts a process it sets a couple of environment variables. The first is a port number that the process should listen on, which gets recorded in the list of processes maintained by the process manager, pm2. The proxy server can then look up the port in response to a request for a specific commit.
The second environment variable is the commit id, which is picked up by the database driver and implicitly relayed to the database server to version data for a specific version of the running code.
There are many improvements and additional features that I would like to add to the proxy server:
- Better tools to provide feedback when errors occur e.g. better logging.
- Proper security.
- Garbage collecting processes that haven’t been accessed in awhile.
- Using the Yarn package manager rather than NPM to address some of the performance and other issues with NPM.
- Add support for Nix to handle system level dependencies.
- Use containers to handle resource isolation and security of running processes.
- Make the proxy server multi tenant, currently it only runs a single application.
- Use the proxy server to gather application metrics and other API gateway style features.
There are a number of existing databases that support versioning, however they’re mostly concerned with syncing decentralised data or big data/data science. I’m more interested in something like the traditional centralised SQL or NoSQL databases, but with versioning of the data. I’m also less interested in features like merging data, since I don’t think it makes a lot of sense to merge data from a test version into production data or to merge data from two separate test commits.
The first approach I tried to versioning data was to use an existing database and write or modify a driver on top of it to version the data with minimal modification to the existing database API. I decided to try monkey patching the driver for Rethinkdb, a schema less database, instead of trying to use typical SQL database like postgres. I did this mainly because I wasn’t sure what the best way to handle different schema versions was, though it’s something I would like to work on in the future.
After working on the rethinkdb driver for awhile I hit a roadblock trying to implement a snapshotting feature, which would save the entire state of the database quickly. I needed this snapshotting feature for an auto testing/regression testing feature I wanted to build.
Basically the idea for the feature is to record the state of the database at the start of a request and details about the request such as the path and parameters. The snapshot is taken and given an id in the proxy server and then the id is added as a header on the request which is proxied to one of the running processes. The process would then use the snapshot id as the basis for a transaction for it’s modifications to the database. Once the request has succeeded or failed the result is recorded in the proxy server, a long with the snapshot id and request details. With the saved details we can start a process up for a new commit and run the request from production with the same database state against it to test if we get the same result.
The way this would work in practice would probably be sampling the incoming requests over a moving window of time or to record queries to specific paths or with specific arguments. Another possibility is a kind of regression testing where production errors are recorded and when new commits are made they are tested to make sure they either fix the error or cause the same error.
So to get this feature I decided to implement my own toy database to get the snapshotting I needed.
I gave the database the deeply imaginative name of “versionbase”. It’s implemented as a simple key value store in node using websockets for the connection, immutable js to structure the transactions, snapshots, versions and data in memory, and jexl expressions to do mapping and filtering.
The CRUD logic of the database is essentially modifying the database structure and returning a new structure after updating the database using the immutable js update operations. Persistence of this structure is extremely basic, it’s essentially handled by synchronously writing out the immutable js data structure after each request. Concurrency is controlled by the fact that node js is cooperatively multitasked, which ends up acting like a global database lock preventing race conditions.
As mentioned above when the client queries the database it passes the version id which is passed in by an environment variable. Git versions in the database can be loaded in by scripts triggered from a post commit hook.
There are many improvements that could be made:
- Hopefully an existing database could be modified to support the versioning features I need, since one of my broad goals is to minimise the number of new dependencies in the system.
- Versioning schemas and representing multiple versions in a typesafe way is also an interesting problem, however I felt it would be too hard to do in the short period of time I had.
- Tools for managing the various versions and snapshots in the database, including things like naming snapshots.
Updating an existing database might be challenging to make work with the existing database code, so it might be necessary to write a database to handle the various features that are necessary. This could possibly be a layer on top of an existing key value store or on top of some type of disk storage mechanism like berkeleydb or sqlite.
Finally it might be the case that a full versioning database isn’t necessary and could be handled by having a bunch of separate database instances to connect to depending on what commit is running, possibly with some kind of system to replicate changes from production to the separate instances.
There’s still a lot to do to make the idea of “production-only development” a reality and I’d like to carry on with these ideas after RC either as a startup or as an open source project if anyone is interested.