Production grade Kubernetes on AWS: 4 tools that made our lives easier

Guy Maliar
Sep 10, 2017 · 6 min read

Articles about our lessons learned, tips and tricks running Kubernetes in production

Kubernetes has many moving parts and it’s hard to do it all by yourself, we knew that we’re need to step up our setup and mind set. We’re big fans of automation, infrastructure as code and immutable infrastructure.

We found the following tools, reducing the amount of manual work that has to be done when setting up our Kubernetes cluster and the on-going development and deployment of our infrastructure.

In the following article we’ll talk about infrastructure and Kubernetes deployment using Terraform and kops, Service deployments using Helm, logging using LogDNA and application and infrastructure monitoring using New Relic.

This is an article in our Production grade Kubernetes on AWS series, other parts are available here:

  1. Production grade Kubernetes on AWS: Primer (Part 1)
  2. Production grade Kubernetes on AWS: 4 tools that made our lives easier (Part 2)
  3. Production grade Kubernetes on AWS: 3 tips for networking, ingress and micro services (Part 3)
  4. Production grade Kubernetes on AWS: 3 lessons learned scaling a cluster (Part 4)

1. Infrastructure topologies and infrastructure as code (kops and terraform)

We knew we’re going to run a highly available cluster on AWS and kops helped us achieve it quickly, throwing terraform into the mix helped us have a single and consistent way to manage all of our infrastructure be it environment specific (production, qa, development) or global (s3 buckets, IAM roles, IAM users, queues etc.). We highly recommend adopting terraform despite some caveats.

Downloading kops is easy enough as shown in their repository and setting up the required permissions is also well documented.

We’ve purchased a different domain name and managed it through Route 53 for our clusters, that made spinning up a working highly available cluster as easy as writing the following commands.

We found it easy to keep template files of our additional environmental needs, such as RDS instances, S3 buckets, SQS queues, Elasticache servers and such and using envsubset to configure them for a specific environment before spinning it up, I’ve included a small SQS example here:

All that is left then is to apply and wait for the cluster to come up.

terraform init
terraform plan
terraform apply

2. Service deployments using Helm

Helm is our go-to tool for deploying services on to Kubernetes. Setting up helm is easy, all you have to do is helm init and helm will install itself to your Kubernetes cluster.

From that point forward, managing service deployments is quite easy. As we’ve wanted to automate the process of deploying upon a successful run of our CI process, I will show you some lessons we’ve learned using Helm and creating an automated deployment process.

Ruby on Rails isn’t quite complex to deploy to production, but Rails’ “deployment playbook” requires us to pre-build our assets, deploy the code, run migrations and any other post release steps that we might have i.e clearing our cache, restarting additional servers, etc.

Using helm chart hooks, we are able to fully automate the process of deploying our Rails application to our cluster.

While it is possible to include everything within the helm chart, we found that excluding HPAs, ConfigMaps and Secrets from helm charts makes it easier to reconfigure and change our scale and configurations on the fly. For the sake of completeness I will include a sample configuration that I believe will make it easy to create a functioning Ruby on Rails service deployment.

Our chart would consist an Ingress resource, Service, Secret, ConfigMap, Horizontal Pod Autoscaler, Deployment for our web service, Deployment for our background workers, migration Job and a sample clear cache post deploy Job.

We’ll use Helm’s templating power to include some placeholders for our versioned container names that we’ll pull from ECR thus enabling us deploying new releases of our project with ease.

Our directory structure will look this:

├── Chart.yml
├── production.yml
├── templates
│ ├── _helpers.tpl
│ ├── cache-job.yml
│ ├── migrate-job.yml
│ ├── ingress.yml
│ ├── web-hpa.yml
│ ├── bg-hpa.yml
│ ├── config-map.yml
│ ├── secrets.yml
│ ├── web-deployment.yml
│ └── bg-deployment.yml
└── values.yml

Our chart file is pretty basic and there’s nothing special about it

I’ve included two extra files called values.yml and production.yml to allow us differentiate between values that are shared across environments and values that are environment specific.

We’ll talk about Requests and Limits later, as you can see some things are specific to Tailor Brands, we have some heavy cache intensive things going on so we’re requesting more memory.

We’re creating config maps, secrets, services, hpas and ingress resource with helm hook annotations so that these would only run once when we’re deploying the service for the first time.

Just as a side note, I am aware that committing secrets into the chart (and repository) is not the best way, but I’m showing it for the sake of completeness.

Our Ruby on Rails deployment and Sidekiq deployment are pretty straightforward and by using _helpers.tpl file we can easily share configuration between the two deployment yamls.

This helper helps our Deployments look like these:

And our Jobs look as such:

The real magic here is how we inject the .Values.release.version in deploy time to Helm.

We’ve created a convention that we’re naming our containers with a prefixed v and the first 6 characters of our current commit. This allows us to have a simple bash script file after the creation and push of our Docker Rails image to ECR that deploys the chart as follows:

There’s a small trick I haven’t included here which is how to have your Kubernetes cluster configuration available on your CI. We’ve actually solved it by having a small Golang service inside our cluster that listens to a specific SQS queue, it makes everything simple and fast, but we will keep this for another blog post about our Chatops.

3. Logging with LogDNA

Logging production data is of utmost importance, we’re huge believers of the Buy vs. Build tradeoff and as of such we bought both of our logging and monitoring solutions for our production and qa clusters.

We are using and use the logdna DaemonSet to ship all our logs to LogDNA.

We can’t stress enough how much LogDNA made our lives, site reliability efforts and disaster management easy using their piece of software.

Setting them up was easy and involved using a simple DaemonSet

All of our containers are logging to STDOUT by default as JSON lines so that LogDNA’s DaemonSet would catch those logs and ship them to their platform and parse them with all the relevant metadata.

One small quirk we found with LogDNA is that there’s no easy way to report which environment the errors come from and by default we add a tag to all of our services to log their respective environment (production, qa, development, etc.).

A sample logline would be like so:


4. Monitoring with New Relic

As stated, even though we know it’s possible to have Heapster ship data to Prometheus using CoreOS’s exceptional prometheus-operator, we decided to work with New Relic for both application performance monitoring and infrastructure monitoring.

The APM itself varies from application to application and from language to language so there is no easy way to show code of how to set this up, but recently New Relic released their new Docker based infrastructure agent, so we’ve implemented it as a DaemonSet that works quite well.

On the next part we’ll explore 3 tips we’ve learned working with container networking interfaces, controlling internal and external traffic with ingress controllers and implementing micro services based architectures on a Kubernetes cluster.

If you’d like to learn more about Tailor Brands, you are more than welcome to also try out our state-of-the-art branding services.

You can follow us here, on twitter, facebook and github to see more exiciting things that Tailor Tech is doing.

If you find this of any interest, you like writing exciting features, create beautiful interfaces and do performance optimizations, we’re hiring!

I’ve started a newsletter to share my stories and interesting posts I find,, don’t worry I won’t post unwanted or any promotional emails.

Tailor Tech

Tailor Brands Engineering Blog

Guy Maliar

Written by

Director of Technology @ Tailor Brands Interested in web development, cloud infrastructure and data engineering

Tailor Tech

Tailor Brands Engineering Blog

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade