We’re using Microscaling to scale our own production MicroBadger Kubernetes deployment in real time. Here’s the story of what we did, and the benefits we’ve seen already.
MicroBadger lets you view the metadata for any image on Docker Hub. To do this it queries data from the Docker Registry API and the Docker Hub API. In addition MicroBadger provides a public RESTful API for querying metadata including richer data we parse from labels following the label-schema.org convention.
The site is deployed to a Kubernetes cluster hosted on AWS. We’ve also developed our Microscaling Engine that does queue-based autoscaling for containers. We recently shipped a Kubernetes integration so we can microscale MicroBadger.
The MicroBadger website gets its data from the MicroBadger API. For getting data from Docker Hub and sending notifications we use 3 SQS queues. Each queue is processed by a separate service.
This service gets most of the metadata by accessing both the Registry and Hub APIs. This includes the standard metadata present for all images like the download size and the number of layers. It also gets any labels the maintainer has added to the image.
While the inspection process is happening we show a loading screen to the user. We inspect the latest version first and poll until its ready and we can show the metadata. Since the user is waiting we need this process to complete as quickly as possible.
Once the inspector has finished it sends the image to the size queue. The size service does a deeper inspection of the image. This includes reverse engineering the dockerfile command used to generate the layer.
The size inspection is an intensive process but we use the design of docker to cache the results. Since each image version is identified by a cryptographic hash we use the same SHA as our identifier, meaning we only process each version once.
On the front end for new images we render the page without the layers as soon as the inspector is done. We then poll until the size inspection completes and we can display the layers.
MicroBadger also provides image change notifications. We call your webhook whenever an image you’re interested in has changed. This can be used to trigger a new build if the base image has changed, or trigger your security scanner, or notify your team in Slack.
When the inspection process runs it checks if there are any new, updated or deleted tags for the image. If there are a message is posted to the notifications queue. The notifier service is responsible for calling your webhook. It posts a JSON message that includes the changed tags.
Throwing containers at the problem
We launched the MVP for MicroBadger in June 2016 during Anne’s keynote at HashiConf EU in Amsterdam. For the launch we took a classic approach and over provisioned heavily! We did this by running 10 inspector and 10 size containers across a 2 node cluster.
This gave us the capacity we needed during the overnight refresh and when the site got busy. The other time we need extra capacity is when someone submits an image with a lot of versions (like weaveworks/weave!). If we ran fewer containers the queue could get blocked meaning other users were stuck on the loading page.
Over provisioning was the pragmatic thing to do but it went against what we we’re trying to achieve with microscaling. So integrating our Microscaling Engine with MicroBadger was always a long term goal.
Moving to Kubernetes
Before implementing microscaling we also changed orchestrator from Docker Cloud to Kubernetes. We use Docker Cloud for our Microscaling-in-a-Box site. So it was also the obvious choice for launching the MicroBadger MVP and we liked the simple setup.
However as we started doing more deploys we wanted a more powerful orchestrator with better rolling deploy support. Kubernetes has this and more, with the tradeoff being its a lot more complex to set up.
Now we were on Kubernetes we needed to make 2 changes to our Microscaling Engine. The first was to add SQS queues as a metric. This is a simple integration that gets the current queue length so the engine can see if its meeting the target.
The second change was to add Kubernetes support to the existing Docker Remote API and Marathon / Mesos support. We did this by using the client-go library to scale the cluster via the Deployments API.
We now run between 2 and 8 containers for both the inspector and size services. The inspector service is set as the highest priority since a user may be waiting for it to complete. The size service is less time critical and is set as lower priority. Having multiple services with different priorities is why microscaling works well with microservices architectures.
How is microscaling helping?
In our case the effect of over provisioning was that around 20% of our AWS bill was just for SQS API calls. We had a lot of containers polling queues that were usually empty.
We’re fortunate to be part of the Activate program which gives us AWS credits but this was still far from ideal. By using microscaling we’ve reduced the number of empty receives in SQS by around 70%. A nice side benefit is that deploys are now faster as usually there are far fewer containers running.
In our case our cluster size is static but microscaling can also be combined with traditional auto scaling using virtual machines. The microscaling can respond in close to real time to smooth demand while more VM capacity is added.
We’d love to see more orgs implementing microscaling or similar autoscaling approaches. We think a key benefit of containers is the faster startup and teardown speeds and cloud native infrastructure should take advantage of this.
We also think container metadata has an important part to play in scaling. You can watch Anne and the fantastic Kelsey Hightower discuss this and data driven deployments in their Holiday webinar. Finally with data centres using 2% of global energy and growing fast we think as an industry we should care about server utilisation as well as developer productivity.
Please hit the Recommend button below if you found this article interesting or helpful, so that others might be more likely to find it.