Production grade Kubernetes on AWS

Guy Maliar
Sep 10, 2017 · 4 min read

Articles about our lessons learned, tips and tricks running Kubernetes in production

We’ve learned a lot these past six months during our transition to Kubernetes and we’re seeing amazing results. We would like to share with the world the tools that made it possible such as Terraform, kops and Helm, our struggle and success with the different CNIs available, ingress controllers and internal micro services communication and how we scaled our DNS, services and pods to support our increasing loads.



Primer

On March 2017, Nadav Shatz and I decided we’re going to migrate our infrastructure from our ECS-based stack to a more modern based one. As a colleague of mine usually likes to point out, your motivation should be clear and directly have a positive impact on your business before transitioning to a new technology stack.

Some pain points we’ve mapped out that circumventing them would push our business forward were faster deployments, extracting some pieces of code out of our monolith and into a more scalable, well-maintained micro-services, easier horizontal and vertical scaling of our infrastructure and better utilization of our machines.

At that time, we were running two Ruby on Rails clusters on AWS ECS, one for our web service and one for our background workers. Our web service was a single container per instance connected to an Elastic Load Balancer and our background workers cluster was a much beefier set of instances running Sidekiq and connected to an Elasticache Redis server.

Our ECS based setup

Deployments were a major pain point, we had a mix of bash scripts running as a deployment step on CircleCI, that built our images, pushed them to ECR, and posted a message to an AWS Lambda endpoint which in turn pushed a message to an SQS queue that would be popped by a tiny python script that would initiate a rolling deployment process on ECS, spinning instances with the new image, decrease old images count, attach them to the ELB, run database migrations and clear our cache. It was quite complicated with a lot of home-brewed running parts, maintained by an outsource “DevOps” company using proprietary infrastructure as code solution that we had no access to.

We’ve known for some time it’s a pain point moving forward due to our ever increasing traffic, growing engineering team and our desire to be flexible in spinning new features that have different technology needs.

We’ve spent a few weeks researching the ecosystem, running tests using a newer ECS stack, Docker Swarm, Nomad, Flynn, Deis and Kubernetes. We’ve weighed the pros and cons of each system and our constraints (team-wise, our knowhow, time-to-production considerations, etc.) and eventually decided to use Kubernetes.

We even consulted with the Deis guys and they helped us decide on running Kubernetes instead of Deis and pointed out that Helm might be a good tool for us (we will talk more about Helm later).

We started by spinning up a small development cluster on AWS using kops, which proved to be easy to work with and easy to set up, it took us 1 month to get from development to QA and another month from QA to production and by May 2017 we were running with Kubernetes on production.

Our Kubernetes based setup

We’ve learned some valuable lessons and always knew we’d like to share with the world, so without further adieu I present to you our 10 lessons learned from running Kubernetes in production.

Our first part will cover four tools we’ve used to set up our Kubernetes cluster and make it production ready, covering bootstrapping, service deployments, logging and monitoring.


The next departs are available in the links above, in the meanwhile, if you’d like to learn more about Tailor Brands, you are more than welcome to also try out our state-of-the-art branding services.

You can follow us here, on twitter, facebook and github to see more exiciting things that Tailor Tech is doing.

If you find this of any interest, you like writing exciting features, create beautiful interfaces and do performance optimizations, we’re hiring!

I’ve started a newsletter to share my stories and interesting posts I find, http://eepurl.com/gcld-T, don’t worry I won’t post unwanted or any promotional emails.

Tailor Tech

Tailor Brands Engineering Blog

Thanks to Yoav Franco and Nitai Perez

Guy Maliar

Written by

Tailor Tech

Tailor Brands Engineering Blog

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade