The O’Reilly Microservice Chassis

A journey from disparate microservices to a shared architecture.

Christopher Pickett
O'Reilly Media Engineering
6 min readJun 24, 2019

--

A mess of microservices

Like many organizations, we’ve spent the past couple of years actively breaking up our main monolith web application into microservices. The way we approached it was fairly organic and changed with each new service we created:

  • Early services were deployed to a VM but ran in a Docker container. Later services were deployed to an internal Kubernetes cluster but the configuration and deployment method changed several times.
  • The way secrets and settings were managed was different in each service.
  • There was no standard way of creating a new service and the recommended method involved copying and pasting an existing service then modifying it.

We were iterating and learning the shape of our microservice architecture, which is great, but the services we created along the way were left in a lurch and never updated to take advantage of the latest thinking.

On top of the standardization issues, we had a list of nice-to-haves that seemed impossible in our current situation. We wanted:

  • to be able to push package and language updates without manually bumping every service
  • a shared deployment / notification pipeline
  • better insight into how many deployments and changesets we were releasing in a given time period
  • a standard way of creating new services that didn’t rely on copying boilerplate code
  • deployment of Golang or other non-Python/JS based services
  • to standardize our process for creating libraries and base Docker images
  • an obvious migration path to the cloud

Enter the chassis

In the automotive industry, the chassis is a frame that has the basic components that every car needs—exhaust, steering, etc.—and then the different variations of car models are built on top of it. This allows them to make a lot of different vehicles, but save cost and engineering time by having them share this basic framework.

A microservice chassis is a similar idea; it provides the basic components that every service needs to function. It provides local development support, deployment tooling, secrets management, etc. so that developers can focus on creating a service that provides a unique function rather than rebuilding or copying boilerplate code and configuration.

The chassis we’ve built is something of an odd beast:

  • It’s a Python Docker image that has our most common Python packages installed.
  • It’s the base image for all of our Python services.
  • It has an embedded Django project that knows how to generate new microservices.
  • It’s responsible for all deployments, managing secrets and Docker configuration for local development.

Working with the chassis

From a developer’s point-of-view, working with the chassis is straightforward. To start up a new microservice, we use an internal CLI tool, named orm, and run a command that pulls the latest chassis and uses it to start up the new project:

The resulting project should be familiar to Django developers. There is a folder that’s named after the project and that folder has a settings.py. The settings.py is a pretty normal Django settings file, except that it also has chassis configuration. A common setup looks something like this:

In this example, we have a Django web service that uses Celery, Redis and Postgres and is tested with PyTest. The “feature classes” here are a way of standardizing the Docker configuration needed to run these features locally and provide the deployment configuration needed to run these features in Kubernetes.

One of the biggest benefits of these classes is that they’re not limited to Python; the chassis has feature classes that provide support for Node, Jest, NPM, Golang, etc. and developers are encouraged to build their own feature classes as needed, all while providing deployment and infrastructure tooling.

The manage feature is a special feature of the chassis and ties directly to the manage.pyof the embedded chassis Django project. It allows developers to run common management commands such as migrate but also acts as an interface to the chassis itself.

Results

Boilerplate

Because of the way our services were built we had roughly 2,000 lines of boilerplate code and configuration in each repository. As we transitioned to the chassis across the organization we were able to delete nearly 100k lines of code and config from our repositories.

Unit tests

We’re big proponents of unit-testing, but given the organic nature of how our services were built not all of them had 100% test coverage, and as a rule the coverage they did have was never enforced at the CI level.

Now that the chassis is responsible for running tests we have the default threshold for all services set to 100% line and branch coverage, and having anything below that requires changing a setting. As of now, we have 85 projects and 54 of those have 100% coverage.

Interfacing external systems

One big point of confusion and frustration that the chassis has done away with is manual interfacing of the services to external systems such as Jenkins, Sentry and New Relic. The chassis has the concept of preflight, a single chassis command, that creates Jenkins builds, Sentry projects and sets up New Relic and so on.

Depending on which features you’ve enabled, the chassis will perform different preflight steps. The steps themselves are idempotent which allows project owners to run the preflight command whenever a feature is added that the project can take advantage of.

Deployment

While we run all of our deployments through Jenkins, the chassis is what’s actually controlling the builds. The chassis has a single Jenkinsfile that is used by all of the builds, the Jenkinsfile itself is a wrapper around commands in the chassis which allows all of our build and deployment tooling to be unit tested.

Building this tooling into the chassis and then updating all of our services to use it enabled us to migrate from our internal Kubernetes cluster to the cloud with a minimal amount of hassle and developer time. What likely would have taken months to complete was done in a single two-week sprint.

Analytics

A interesting side-effect of the shared build system is that we are also able to capture build and deployment metrics for every project that uses the chassis.

Prior to the chassis, knowing what versions of Django or Python was used in production, what test coverage we have across all of our projects or how recently a service has been deployed or as a whole how many deployments we were doing in a given week required a lot of manual labor, SSH and spreadsheets.

Now we have a dashboard where we can ask questions like “Which services have been deployed to production?”, “When they were deployed?”, “What version of the chassis (and as a by-product what Django version) are they using?” or “How many deployments to production did we have in the last week?”

Here are a few sample dashboards showing some analytics we are able to display to give us an easy view into the system:

Number of QA / Production deployments by each team.
A team’s dashboard showing deployment data about their services.

Time to market

The chassis has had a big impact on the amount of time it takes to have an idea and get that idea implemented in production. An early adopter of the chassis internally was able to move from idea to a fully-implemented service running in production in less than a month, a feat we’d never managed before!

Conclusion

We now have over 80 different services/libraries using the chassis and it’s been a huge success internally:

  • It gives us a standardized platform for building microservices.
  • It reduced the knowledge overhead for any given developer.
  • It reduced the time-to-market.
  • It removed thousands of lines of boilerplate from our repositories—2,000+ lines per repository on average!
  • It allows us to deliver and deploy critical security fixes with a single commit to the base image.
  • It facilitated our migration to Google Kubernetes Engine from an internal system.
  • It enabled us to have analytics about our services and deployments.

How have you tackled standardizing your microservices? Do you use something like our chassis? What’s something your organization has done to address any of the issues we had? We’re always interested in learning from others and helping spread great ideas. Reach out on Twitter or write a response and let us know!

--

--