AMI Creation with Aminator
by Michael Tripoli & Karate Vick
Aminator is a tool for creating custom Amazon Machine Images (AMIs). It is the latest implementation of a series of AMI creation tools that we have developed over the past three years. A little retrospective on AMI creation at Netflix will help you better understand Aminator.
Building on the Basics
Very early in our migration to EC2, we knew that we would leverage some form of auto-scaling in the operation of our services. We also knew that application startup latency would be very important, especially during scale-up operations. We concluded that application instances should have no dependency on external services, be they package repositories or configuration services. The AMI would have to be discrete and hermetic. After hand rolling AMIs for the first couple of apps, it was immediately clear that a tool for creating custom AMIs was needed. There are generally two strategies for creating Linux AMIs:
The loopback method has its place for creating a foundation AMI and is analogous to a bare-metal OS installation. This method is too complex and time consuming to automate at the scale we need. Our tools follow the latter strategy. This strategy requires a source, or base AMI against which customizations can be applied.
The initial component of our AMI construction pipeline is the foundation AMI. These AMIs will generally be pristine Linux distribution images, but in an AMI form that we can work with. Starting with a standard Linux distribution such as CentOS or Ubuntu, we mount an empty EBS volume, create a file system, install the minimal OS, snapshot and register an AMI based on the snapshot. That AMI and EBS snapshot are ready for the next step.
Most of our applications are Java / Tomcat based. To simplify development and deployment, we provide a common base platform that includes a stable JDK, recent Tomcat release, and Apache along with Python, standard configuration, monitoring, and utility packages. The base AMI is constructed by mounting an EBS volume created from the foundation AMI snapshot, then customizing it with a meta package (RPM or DEB) that, through dependencies, pulls in other packages that comprise the Netflix base AMI. This volume is dismounted, snapshotted, and then registered as a candidate base AMI which makes it available for building application AMIs.
This base AMI goes through a candidate test and release process every week or two, which yields common stable machine configurations marching through time. Developers can choose to “aminate” against the current release base AMI, or elect to use a candidate base AMI that may have improvements that will benefit their application or contain a critical update that they can help verify.
“Get busy baking or get busy writing configs” ~ mt
Customize Existing AMIs
Phase 1: Launch and Bake
Our first approach to making application AMIs was the simplest way: customize an existing AMI by first running an instance of it, modifying that, and then snapshotting the result. There are roughly five steps in this launch / bake process.
- Launch an instance of a base AMI.
- Provision an application package on the instance.
- Cleanup the instance to remove state established by running the instance.
- Run the ec2-ami-tools on the instance to create and upload an image bundle.
- Register the bundle manifest to make it an AMI.
This creates an instance-store or S3 backed AMI.
While functional, this process is slow and became an impediment in the development lifecycle as our cloud footprint grew. As an idea of how slow, an S3 bake often takes between 15 and 20 minutes. The slowness of the creation of an S3 AMI is due to it being so I/O intensive. The I/O involved in the launch / bake process includes these operations:
- Download an S3 image bundle.
- Unpack bundle into the root file system.
- Provision the application package.
- Copy root file system to local image file.
- Bundle the local image file.
- Upload the image bundle to S3.
Wouldn’t it be great if this I/O could somehow be reduced?
The advent of EBS backed AMIs was a boon to the AMI creation process. This is in large part due to the incremental nature of EBS snapshots. The launch / bake process significantly improved when we converted to EBS backed AMIs. Notice that there are fewer I/O operations (not to be confused with iops):
- Provision EBS volume.
- Load enough blocks from EBS to get a running OS.
- Provision the application package.
- Snapshot the root volume.
The big win here is with the amount of data being moved. First, no on-instance copying is involved. Second, considering a 100MB application package, the amount of data copied to S3 in the incremental snapshot of the root volume is roughly 7–8% that of a typical image bundle. The resulting bake time for EBS backed AMIs is typically in the range of 8–10 minutes.
Phase 2: Bakery
The AMI Bakery was the next step in the evolution of our AMI creation tools. The Bakery was a big improvement over the launch/bake strategy as it does not customize a running instance of the base AMI. Rather, it customizes an EBS volume created from the base AMI snapshot. The time to obtain a serviceable EC2 instance is replaced by the time to create and attach an EBS volume.
The Bakery consists of a collection of bash command-line utilities installed on long running bakery instances in multiple regions. Each bakery instance maintains a pool of base AMI EBS volumes which are asynchronously attached and mounted. Bake requests are dispatched to bakery instances from a central bastion host over ssh. Here is an outline of the bake process.
- Obtain a volume from the pool.
- Provision the application package on the volume.
- Snapshot the volume.
- Register the snapshot.
The bakery reduced AMI creation time to under 5 minutes. This improvement led to further automation by engineers around Netflix who began scripting bakery calls in their Jenkins builds. Coupled with Asgard deployment scripts, by committing code to SCM, developers can have the latest build of their application running on an EC2 instance in as little as 15 minutes. The Bakery has been the de facto tool for AMI creation at Netflix for nearly two years but we are nearing the end of its usefulness. The Bakery is customized for our CentOS base AMI and does not lend itself to experimenting with other Linux OSs such as Ubuntu. At least, not without major refactoring of both the Bakery and the base AMI. There has also been a swell of interest in our Bakery from external users of our other open source projects but it is not suitable for open sourcing as it is replete with assumptions about our operating environment. Aminator is a complete rewrite of the Bakery but utilizes the same operations to create an AMI:
- Create volume.
- Attach volume.
- Provision package.
- Snapshot volume.
- Register snapshot.
Aminator is written in Python and uses several open source python libraries such as boto, PyYAML, envoy and others. As released, Aminator supports EBS backed AMIs for Redhat and Debian based Linux distributions in Amazon’s EC2. It is in use within Netflix for creating CentOS-5 AMIs and has been tested against Ubuntu 12.04 but this is not the extent of its possibilities. The Aminator project is structured using a plugin architecture leveraging Doug Hellman’s stevedore library. Plugins can be written for other cloud providers, operating systems, or packaging formats.
Aminator has fewer features than the Bakery. First, Aminator does not utilize a volume pool. Pool management is an optimization that we sacrificed for agility and manageability. Second, unlike the Bakery, Aminator does not create S3 backed AMIs. Since we have a handful of applications that deploy S3 backed AMIs, we continue to operate Bakery instances. In the future, we intend to eliminate the Bakery instances and run Aminator on our Jenkins build slaves. We also plan to integrate Amazon’s cross-region AMI copy into Aminator.
Aminator offers plenty of opportunity for prospective Netflix Cloud Prize entrants. We’ll welcome and consider contributions related to plugins, enhancements, or bug fixes. For more information on Aminator, see the project on github.
Originally published at techblog.netflix.com on March 22, 2013.