Using Containers to Build a Microservices Architecture
In a previous post, I talked about how Linux container technology, such as Docker, can be used to streamline the development and testing experience. Because containers are portable across different types of infrastructure — they can run in AWS just as easily as they can on bare-metal servers — containers make deployment of code extremely convenient. For development and test workloads this eliminates a lot of the guesswork and finger pointing that tend to occur when slight difference between the development and test environments causes a deployment to fail.
In this post, we’ll explore how many of the characteristics that make containers a great choice for dev/test workloads also make them an excellent choice for building a microservices-based architecture in AWS. A microservices architecture is an approach that makes web based development more agile and code bases easier to maintain. We’ll discuss how this architecture enables developers to be highly productive, and to quickly iterate and evolve a code base. For fast moving start-up companies, the microservices architecture can really help dev teams be nimble and agile in their development efforts.
A Brief History of Web Development
But first, let’s take a quick walk through the history of web-based development over the last 20 years or so, as this will provide some important context to explain why microservices architectures have become so popular, and the problems such an architecture solves.
In the earliest days of web application development, applications were built using the Common Gateway Interface (CGI). This interface provided a way for a web server to execute a script — usually written in Perl — when handling an HTTP request from a browser. The CGI architecture didn’t scale well, because a Perl process would be launched for each incoming request for a script resource. To solve this, the popular web servers at the time added support for modules. Apache, one of the most popular web servers to this day, added something called “mod_perl”, which was a way of running Perl code inside the server itself. This allowed CGI scripts to execute much faster.
Although technologies like mod_perl were a big improvement over traditional CGI, there were still problems. Namely, the code responsible for building the views (i.e. emitting the dynamic portions of HTML on the page) was intermingled with the business logic of the application. This meant that completing a simple task, like adding a column to an HTML table or a new element to a form, often required changing low-level application code. So the next round of evolution in web programming technology resulted in “server pages,” which were templating frameworks that allowed executable code to be embedded alongside HTML. This allowed for much cleaner separation of application logic from view logic. In the Java programming world, a design pattern called “Model 2” rapidly emerged, which involved putting application code into Java servlets, data into classes called Java Beans, and view logic into Java server pages, as shown in Figure 1:
The “Model 2” design evolved quickly into the Model-View-Controller (MVC) design that is widely used today. Many of the early MVC frameworks were Java-based (like Apache Struts), but others, like Ruby on Rails grew rapidly in popularity. With the MVC design pattern, “controller” classes define methods that are mapped to URL patterns using a class called a “route.” The controller methods utilize “model” classes that encapsulate the business logic and the data of the core application entities. Finally, each controller method renders a “view” used to display and edit the data in the corresponding model class. This pattern imposes a clean separation of concerns between business, application, and view logic, as shown in Figure 2:
The Rise of the REST Protocol
As it turns out, MVC frameworks are also well suited for developing REST endpoints. The resource-oriented nature of REST maps nicely to the concept of controllers and models, as shown in Figure 3.
The Monolithic Architecture
So the MVC applications that had once consisted of models, views, and controllers to serve up primarily HTML content morphed into applications that served up not only traditional HTML, but also JSON via REST endpoints. Many of these applications used a monolithic architecture. The application is deployed as a single file (i.e. Java) or a collection of files rooted at the same directory (i.e. Rails). All the application code runs in the same process. Scaling requires deploying multiple copies of the exact same application code to multiple servers. A monolithic architecture is depicted in Figure 4:
There are a number of problems with the monolithic architecture. First, as features and services are added to the application, the code base will grow substantially more complex. This can be daunting and intimidating to new developers. Modern IDEs may even have problems loading the entire application code, and compile and build times are long. Because all the application code runs in the same process on the server, it is difficult (if not impossible) to scale individual portions of the application. If one service is memory intensive and another CPU intensive, the server must be provisioned with enough memory and CPU to handle the baseline load for each service. This can get expensive if each server needs high amount of CPU and RAM, and is exacerbated if load balancing is used to scale the application horizontally. Finally, and more subtlety, the engineering team structure will often start to mirror the application architecture over time. UX engineers will be tasked with building the UI components, middle-tier developers will build the service endpoints, and database engineers and DBAs will handle the data access components and database. If a UX engineer wants to add some data to a screen, this will involve coordination with the middle-tier and database engineers. Like water, humans tend to take the path of least resistance, and this means that each engineering group will try to embed as much logic into the portion of the application they control. This is a formula that guarantees unmaintainable code over time.
The Microservices Architecture
The microservices architecture is designed to address these issues. The services defined in the monolithic application architecture are decomposed into individual services, and deployed separately from one another on separate hosts.
Each microservice is aligned with a specific business function, and only defines the operations necessary to that business function. This may sound exactly like service-oriented architecture (SOA), and indeed, microservices architecture and SOA share some common characteristics. Both architectures organize code into services, and both define clear boundaries representing the points at which a service should be decoupled from another. However, SOA arose from the need to integrate monolithic applications that exposed an API (usually SOAP-based) with one another. In SOA, integration relies heavily on middleware, in particular enterprise service bus (ESB). Microservices architecture may often make use of a message bus, but there is no logic in the messaging layer whatsoever—it is purely used as a transport for messages from one service to another. This differs dramatically from ESB, which contains substantial logic for message routing, schema validation, message translation, and business rules. As a result, microservices architectures are substantially less cumbersome than traditional SOA, and don’t require the same level of governance and canonical data modeling to define the interface between services. With microservices, development is rapid and services evolve alongside the needs of the business.
Another key advantage of the microservices architecture is that a service can be individually scaled based on its resource requirements. Rather than having to run large servers with lots of CPU and RAM, microservices can be deployed on smaller hosts containing only those resources required by that service. In addition, each service can be implemented in the language most suitable for the operations that service performs. An image processing service can be implemented using a high-performance language like C++. A service performing math or statistical operations may be implemented in Python. Services performing basic CRUD operations for resources might be best implemented in Ruby. The microservices architecture doesn’t require the “one size fits all” model of the monolithic architecture, which will generally use a single MVC framework and a single programming language.
But there are some disadvantages to microservices as well. Because services will be spread across multiple hosts, it can be difficult to keep track of which hosts are running certain services. Also, even though each host may not be as powerful as the host running the monolithic application, as the microservices architecture scales out, the number of hosts will grow faster than it will with a monolithic architecture. In the AWS environment, there may even be microservices that don’t require all the resources of even the smallest EC2 instance type. This results in over-provisioning and increased costs. If services are implemented in different programming languages, this means the deployment of each service will require a completely different set of libraries and frameworks, making deployment to a server complex.
Containers to the Rescue
Linux containers can help mitigate many of these challenges with the microservices architecture. Linux containers make use of kernel interfaces such as cnames and namespaces, which allow multiple containers to share the same kernel while running in complete isolation from one another. The Docker execution environment uses a module called libcontainer, which standardizes these interfaces. Docker also provided a GitHub-like repository for container images called DockerHub, making it easy to share and distribute containers.
It is this isolation between containers running on the same host that makes deploying microservice code developed using different languages and frameworks very easy. Using Docker, we could create a DockerFile describing all the language, framework, and library dependencies for that service. For example, the following DockerFile could be used to define a Docker image for a microservice that uses Ruby and the Sinatra framework:
MAINTAINER John Doe <firstname.lastname@example.org>
RUN apt-get update && apt-get install -y curl wget default-jre git
RUN adduser --home /home/sinatra --disabled-password --gecos '' sinatra
RUN adduser sinatra sudo
RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
RUN curl -sSL https://get.rvm.io | bash -s stable
RUN /bin/bash -l -c "source /home/sinatra/.rvm/scripts/rvm"
RUN /bin/bash -l -c "rvm install 2.1.2"
RUN /bin/bash -l -c "gem install sinatra"
RUN /bin/bash -l -c "gem install thin"
A container created from this image could easily be placed on a host running another container created from a Docker image using Java and the DropWizard framework. The container execution environment isolates each container running on the host from one another, so there is no risk that the language, library, or framework dependencies used by one container will collide with that of another.
The portability of containers also makes deployment of microservices a breeze. To push out a new version of a service running on a given host, the running container can simply be stopped and a new container started that is based on a Docker image using the latest version of the service code. All the other containers running on the host will be unaffected by this change.
Containers also help with the efficient utilization of resources on a host. If a given service isn’t using all the resources on an Amazon EC2 instance, additional services can be launched in containers on that instance that make use of the idle resources. Of course, deploying services in containers, managing which services are running on which hosts, and tracking the capacity utilization of all hosts that are running containers will quickly become unmanageable if done manually.
The recently announced Amazon EC2 Container Service (Amazon ECS) takes care of all of this for you. With Amazon ECS, you define a pool of compute resources called a “cluster.” A cluster consists of one or more Amazon EC2 instances. Amazon ECS manages the state of all container-based applications running in your cluster, provides telemetry and logging, and manages capacity utilization of the cluster, allowing for efficient scheduling of work. Amazon ECS provides a construct called a “task definition”, which is used to define a grouping of containers that comprise an application. Each container in the task definition specifies the resources required by that container, and Amazon ECS will schedule that task for execution based on the available resources in the cluster.
A microservice is easily defined as a task and might consist of two containers — one running the service endpoint code, and another a database. Amazon ECS manages the dependencies between these containers, as well as all the balancing of resources across the cluster. Amazon ECS also provides seamless access to important AWS services like Elastic Load Balancing, Amazon EBS, Elastic Network Interface, and Auto Scaling. With Amazon ECS, all these essential features for deploying applications using Amazon EC2 become available to container-based applications.
Container management solutions like Amazon ECS also simplify the implementation of something called “service discovery.” Because microservices are often deployed across multiple hosts, and often scale up and down based on load, service discovery is needed in order for one service to know how to locate other services. In the simplest case, a load balancer can be used for this. But in many cases it’s necessary to use a true distributed configuration service, such as Apache Zookeeper. The Amazon ECS API makes it possible to integrate with 3rd party tools like Zookeeper. It is also possible to use Amazon ECS to manage a Zookeeper cluster. The containers comprising the Zookeeper cluster can be grouped together using a task definition, and scheduled for execution on the Amazon EC2 hosts in the cluster by the Amazon ECS service.
In many ways, the use of containers to implement a microservices architecture is an evolution not unlike those observed over the past 20 years in web development. Much of this evolution has been driven by the need to make better use of compute resources and the need to maintain increasingly complex web-based applications. As we’ve seen, the use of a microservices architecture with Linux containers addresses both these needs. We touched briefly on how a microservice can be defined as a task in Amazon ECS, but the use of containers in distributed systems goes far beyond microservices. Increasingly, containers are becoming “first class citizens” in all distributed systems, and in a future post we’ll discuss how tools like Amazon ECS are essential for managing container-based computing.
Check out the next post in this series: Cluster-Based Architectures Using Docker and Amazon EC2 Container Service.