Conduit : Services Archetype

Published in

Blue Sky Tech Blog

4 min readAug 13, 2018

Prev: Conduit : Pipeline Resource Identifiers

In March of 2017, we started development of our web services stack to support production. Instead of building each service from the ground up, it was more efficient to design a common archetype from which unique services could be derived. This core archetype would provide much of the common functionality that would be tedious to design, deploy and maintain on a per-service basis. Leading the development of this archetype were our services lead, Pranay Patel, and our principal engineer, Oliver Staeubli. To guide the design of the archetype, we developed a set of user stories.

User Stories

We brainstormed dozens of user stories with categories that included application shells, logging, authentication, access control, metrics, configuration, messaging, deployment, testing, monitoring, and documentation. Examples of our user stories included:

As a developer, I need my environment (frameworks, build system, etc.) setup, so I can start developing the actual archetype.

As a developer, I can easily log debug, info and error messages in a uniform way, so that I don’t have make something up myself.

As a developer, I can authenticate users against my service, so my service can identify the source of a request.

Clearly outlining user stories at the outset guided the development of the archetype and ensured there was a clear definition for “what success looks like” as we deployed the service(s).

Our first use case was the Media Service. The Media Service would become an elaborate media storage and versioning system. We developed the archetype that would become the core foundation of our services based on this use case and started with these broad assumptions:

We will have a global docker registry setup
We will be writing our services in Java (Spring Framework)
The services need to horizontally scale and should therefore be stateless

Developer Workflow

A fundamental component of designing the archetype was to clearly define how developers of these services would create, test, and deploy to production.

An early proposal for how services would be deployed at Blue Sky.

While this workflow has evolved over the past 16 months, many of the concepts in this original design laid the groundwork for future services and software development at Blue Sky.

Technologies

One advantage to the timing of our Conduit initiative was the availability of mature open source technologies to build upon. We settled on a few key required technologies based on broad industry adoption as well as their ability to fulfill our user story requirements:

Docker — standard for deploying services
Cassandra — scalable data store
Nginx — route and load-balance requests to services
Jenkins — automated build and deployment

We also decided to develop our initial services (and the archetype) in JAVA based on … well … it’s JAVA, and is arguably the most battle tested. Recently, we’ve been moving our services to Django/Python, but JAVA made the most sense during initial development. So that meant:

Spring Framework — for providing core JAVA features to the services

To round out the basic functionality of the archetype, we had to meet the needs of this user story:

As a developer, I can emit and listen for messages (events) to/from a messaging queue, so my service can respond to production events.

There are some amazing messaging libraries to choose from but we decided to start out with what seemed like a good, safe choice:

RabbitMQ — for providing interprocess communication

Search

While Cassandra met the needs for scalability and data integrity through robust replication, by itself it’s not so great for searching.

As a developer, my service should provide the ability to quickly return and aggregate search query results based on various criteria.

The Media Service in particular would require robust searching across shots, sequences, shows, assets, etc. While Cassandra provides the canonical data store, we would look to the gold standard for providing indexed search results:

ElasticSearch —aggregated full-text search

One key difference in the archetype’s usage of ElasticSearch within Conduit vs. a more standard implementation is we ultimately return a set of PRIs (which we will get into in a future post).

Performance Monitoring

From the beginning, we prioritized performance monitoring and metrics as detailed by this user story:

As a developer, my service should be registered with a monitoring system, so that Production Engineering can be notified of a service outage (or degradation).

We ultimately found two open source technologies that have been critical for ensuring the services remain performant:

InfluxDB — for storing time series performance metrics
Grafana — web interface for visualizing performance

After four months of development, we had a v1.0 of our archetype and accompanying media services based on that archetype.