Production grade Kubernetes on AWS: 3 tips for networking, ingress and microservices

Guy Maliar
Tailor Tech
Published in
4 min readSep 10, 2017

Articles about our lessons learned, tips and tricks running Kubernetes in production

There are many decisions to be made when setting up a production cluster, we’ve struggled for quite a while to understand the different available options, networking in the container world is quite hard, there are many competing technologies, load balancing and controlling external traffic can get complicated and how to get started with internal service communication has many options, in this article we’ll show you the decisions we made and found that work out well for us.

This is an article in our Production grade Kubernetes on AWS series, other parts are available here:

  1. Production grade Kubernetes on AWS: Primer (Part 1)
  2. Production grade Kubernetes on AWS: 4 tools that made our lives easier (Part 2)
  3. Production grade Kubernetes on AWS: 3 tips for networking, ingress and micro services (Part 3)
  4. Production grade Kubernetes on AWS: 3 lessons learned scaling a cluster (Part 4)

5. Container Networking Interfaces

We’ve had a lot of trouble getting Kubernetes’ networking working for us, we’ve tried out Weave and recently experimented with Flannel. Going forward I would suggest using Flannel as there is a current issue with running Weave on Kubernetes with auto-scaling groups as mentioned here, https://github.com/weaveworks/weave/issues/2797.

Weave is a great piece of technology and we’ve circumvented those issues using a slightly different configuration that is 99% based on this gist by @mikebryant, https://gist.github.com/mikebryant/f5b25f9b14e5d6275ff0d3e934f73f12 and it seems to be working pretty well for us.

We’ve opted for the CNI approach spinning up our cluster, and we’re using RBAC for authentication so we set up Weave as a DaemonSet with the appropriate roles as shown below.

Notice we’re setting WEAVE_MTU to 8950, AWS allows jumbo frames on most of their instances as stated here, http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

Should you choose to use Flannel, I’ve included the following configuration:

6. Ingress Controller

Ingress and Ingress Controllers are another two important primitives of Kubernetes, we will talk about scaling them for production later but for now, I’d like to show a simple deployment of the nginx based ingress controller that we use for our services.

We’ve experimented also with istio and linkerd and my experience that it is probably the way forward, recently bouyant even had a piece about how to mix the two together, it looks good on paper, but it is a bit more difficult in production. We do want to use these technologies to allow us better deployment strategies, shadow traffic and more, but for the time being we’re sticking with nginx ingress.

An ingress controller is a mix of a few Deployment, Service, Horizontal Pod Autoscaler resources, I will show an example of how to set one up, which is very closely related to how we set it up in Tailor Brands, even though that in hindsight I would have just used a Helm chart to do so.

7. Microservice Architecture

While we do know that inter-service communication inside the cluster would be improved significantly should we use a binary protocol instead of a set of REST based APIs, we are experimenting with Thrift and gRPC at present but have yet to make a decision and currently we use HTTP protocol with JSONAPI as our inter-service API.

kops sets up kube-dns for us, Services use kube-dns to have a single domain to the underlying pods, so we are able to easily communicate between two services through HTTP request and the Kubernetes convention.

All exposed services are available through URLs such as this:

http://svc1.svcnamespace.svc.cluster.local

All that’s left is to create a service and deployment which are similar to the configuration you’ve seen here and you’re good to go.

The final part is dedicated to 3 lessons we learned while scaling our cluster grows to beyond a few machines.

We’ll cover DNS and kube-dns issues, cluster autoscaling combined with pod disruption budgets and horizontal pod scheduling and how to configure requests and limits in Kubernetes.

If you’d like to learn more about Tailor Brands, you are more than welcome to also try out our state-of-the-art branding services.

You can follow us here, on twitter, facebook and github to see more exiciting things that Tailor Tech is doing.

If you find this of any interest, you like writing exciting features, create beautiful interfaces and do performance optimizations, we’re hiring!

--

--