Implementing Kubernetes: The Hidden Part of the Iceberg — Part 1

Comic about how much is involved with Kubernetes. A dinosaur expresses excitement at starting Kubernetes and plunges into the deep end of a lake as you see all the elements involved with Kubernetes.
Figure 1: Kubernetes comic

The main goal of writing this article was to gather personal and team experiences and challenges when implementing a production-grade fleet of Kubernetes clusters at GumGum. I hope you enjoy the reading!

The Tip of the Iceberg

Everyone is in love with Kubernetes. It’s the most popular high school student, and no one can escape their enchants. It seems to be perfect, as it aims to solve most of the problems modern companies face when dealing with complex microservice architectures and cloud-native applications. Most people are convinced about that, and whenever asked about how to handle this situation, a single word pops up in your mind: Kubernetes.

Kubernetes will provide your applications with service discovery, health checks, autoscaling, scheduling, enhanced security, and way more. Wow, that sounds amazing! You must be a stone-age DevOps/Developer if you don’t want to follow the Kubernetes path. Well, that is just the tip of the Iceberg.

That is what most people navigating the vast IT landscape can see when looking at Kubernetes. Let me confess something about myself: I’ve been navigating those IT seas for quite some time. I don’t consider myself to be a captain, maybe a helmsman, but I am a very seasoned sailor, one that has seen a lot of beautiful things and treasures, but one that had also witnessed ships (teams and projects) sink after colliding with gigantic icebergs, like SOA Architecture, Virtualization, Cloud Computing, etc. Don’t get me wrong. I am not an IT caveman; I love new technologies. I am passionate about reading, learning, and implementing new tech, tools, and frameworks. My key point is that most of these technology trends/innovations have been tremendously overhyped, and the same is true about Kubernetes. As I mature as a DevOps Engineer, I try to be more watchful and thoughtful about new technologies, like Kubernetes, to make sure I understand what I’m getting my company into.

GumGum’s Kubernetes Voyage

Our Kubernetes voyage has an exciting background. Back in 2017, we were sweating and battling with our container orchestration solution, AWS ECS. We implemented ECS mostly because GumGum was born as a 100% cloud-native company, working with AWS since its beginnings in 2008, so implementing ECS was a natural move for us after Dockerizing our apps. When I joined GumGum in 2018, I had previous experience with Docker and some of its ecosystems, like Docker Swarm, Apache Mesos, and Marathon, but ECS was entirely new. To be honest, ECS lacked a lot of the required features to be a mature/production-ready container orchestration engine, including proper service discovery and volume management, and the scheduling and autoscaling features are trivial. Long story short, we decided that it was time to move away from ECS to a more mature, scalable, and feature-rich container orchestration system. We raised our heads, looked up to the horizon, and it was there: Kubernetes!

As a DevOps team, we decided that it was time to migrate all our container/cloud-native applications to Kubernetes; in the end, it was a no-brainer. Kubernetes provided all the features that ECS was lacking and more than that. Kubernetes became the industry-standard container orchestration engine, and it would have been misguided to not accept that reality. The Kubernetes ecosystem was vast, we loved what we saw, but with so many building blocks to pick from, we needed a map. We will talk about that more later on.

How we Implemented Kubernetes at GumGum

OK, we decided to migrate from ECS to Kubernetes. Now what? Whenever you embark on a project with an enormous scope, it’s critical that you spend time to understand the trade-offs of each solution for the business. To do this, you need to ask hard questions and research all available options. Right after, we started to have some heated — but professional — internal discussions. Everyone was proposing ideas, tools, strategies, frameworks to deploy Kubernetes at GumGum properly. Some of the questions and concerns that we discussed, and you should also consider, were:

  • Are we going to deploy, configure and manage Kubernetes ourselves?
  • How are we going to build, package, test, and deploy the applications to Kubernetes?
  • How are we going to monitor the applications deployed to Kubernetes and the Kubernetes control plane itself?
  • How are we going to collect and aggregate logs for the apps running in the clusters? Are any application changes needed?
  • As we run applications on AWS, how will we grant and isolate IAM permissions for every application?
  • When deploying/running stateful applications, how will we provision and manage the required persistent storage for the apps (PV/PVC)?
  • How are we going to handle apps secrets securely?
  • How will developers and administrators access the Kubernetes clusters and resources?

These are just a sample of questions that developers and DevOps engineers have to answer when deploying and managing Kubernetes for an enterprise. I am not talking about playing around with minikube or k3s, which requires a couple of CLI commands to have up and running a bare Kubernetes cluster to which you can deploy and test your newly developed microservice. I am talking about deploying, configuring, and maintaining a Kubernetes cluster capable of scheduling, running, and monitoring applications handling 100,000 requests per second, all without disrupting the services’ availability or the DevOps team’s sweet dreams (hopefully).

Those questions are the Hidden Part of the Iceberg, the often-overlooked aspects of Kubernetes that most people don’t consider when migrating their stack to Kubernetes. Unfortunately, these limitations are often discovered much later, when design changes are much more expensive. I include myself and part of my team into that majority of the people. To be honest, we overlooked many of those questions listed above. We started taking them into account when our ship sailed (Kubernetes) and its containers started scraping up against an enormous iceberg hidden underneath the surface.

Kubernetes Deployment Strategy

“All you need to do is run minikube, right?”

Deploying a Kubernetes cluster can be as simple as running:

brew install minikube && minikube start

Code sample of what happens when you start minikube
Code sample of what happens when you start minikube
Fig 2 — Spinning up a cluster with minikube

As I mentioned earlier, deploying and running a Kubernetes cluster like this should be enough for local development purposes and for learning and exploring the Kubernetes universe, but not enough when you need to run real-world applications serving users and systems that need to be online 24/7.

Deploying and configuring a production-grade Kubernetes cluster requires repeatability, predictability, consistency, and safety when installing, configuring, upgrading, and managing any Kubernetes clusters.

Before choosing the strategy and tool to deploy your fleet of Kubernetes clusters, you need to consider one key aspect of the Kubernetes strategy, probably the most crucial question/factor: should you use a managed Kubernetes solution (AKS, EKS, GKE, Rancher, OpenShift)? Or is it going to be a vanilla Kubernetes distribution that you will deploy and manage by yourself? This is a profound question that should be answered after thoughtful consideration. If you consider deploying and managing a fleet of Kubernetes clusters by yourself, be prepared to spend a lot of time and energy from you and your team to maintain those clusters. Some large companies have the budget for a dedicated Kubernetes team, which makes sense given their scale and regulatory needs. They will have more control and probably will be saving money at the end of the day. But if you are a startup or a small/medium company, I would say that the most reasonable deployment strategy is to use a cloud/managed Kubernetes distribution/deployment, and in that land, there are a few leading players to consider:

I am not going to do a deep dive into each of the KaaS (Kubernetes as a Service) offerings, as that would be a huge blog post on its own, but for GumGum, the decision was relatively easy to make. As I mentioned earlier, GumGum is a 100% cloud-native company, with all the cloud eggs residing in one basket: AWS. So, picking a different vendor/cloud provider for our managed Kubernetes solution would have led to additional management overhead (managing two cloud vendors instead of one), paying hefty networking fees due to all the traffic between the fleet of Kubernetes clusters and our AWS VPCs, and other unexpected results of this multi-cloud strategy (expect the unexpected!) Although I must recognize that Google GKE is the most advanced managed Kubernetes platform out there, this makes sense given that Google invented Kubernetes itself (Borg)!

Kubernetes joke: “Running a bare metal Kubernetes cluster in production isn’t stressing me out anymore” — Mark, 22 years old, with an image of a much older man
Kubernetes joke: “Running a bare metal Kubernetes cluster in production isn’t stressing me out anymore” — Mark, 22 years old, with an image of a much older man
Fig 3 — Running a bare metal Kubernetes Cluster

Kubernetes Deployment Tool

Now you have chosen a pattern or platform to deploy Kubernetes for your company. That’s amazing! But wait, we are just starting to pave the road to a successful and mature fleet of Kubernetes clusters. You need some sort of tool/platform that will help you with the installation, configuration, management, and upgrade of your clusters. The most popular tools for doing the job are kops, kubeadm, kupespray, kargo, and Terraform. While I’m not going to compare those tools in this article, here is a couple of interesting links that will shed some light on the matter:

  1. Kubernetes Deployment: The Ultimate Guide
  2. A Multitude of Kubernetes Deployment Tools

Terraform is not a tool designed specifically for deploying Kubernetes clusters, but its powerful and flexible capabilities make it an ideal tool for doing this job. We selected Terraform because we have been growing and maturing expertise with the tool and its ecosystem, which we use for provisioning and maintaining all kinds of AWS infrastructure. The next step with Terraform and EKS was to create a flexible but robust reusable module to deploy, update, and maintain Kubernetes clusters across all AWS regions and accounts. AWS provides and supports an official Terraform module for EKS. Using that as a base EKS module, we created our own module, with additional features and perks such as IAM-based authentication, Spot.io integration (for provisioning spot-based worker nodes), and AWS Secrets Manager integration.

Having our custom Terraform module in place, we now needed a mechanism for safely creating, modifying, and upgrading our small fleet of Kubernetes clusters. By the time of writing this article, GumGum operated 20 EKS clusters across 4 AWS regions.While not Google-scale, this fleet is large enough (and will grow more) to necessitate automation for its management. For this, we built a CI/CD pipeline created with Drone and Terragrunt (Terraform thin wrapper), which helps us maintain a DRY infrastructure. So far, Drone, Terraform, and Terragrunt have been the right fit for GumGum. That stack worked well originally, but scaling up had limitations, like how to handle breaking changes or testing to make sure an upgrade went through okay. For that reason, we are considering moving to a more robust and fully managed CI/CD solution, like Harness.io which we will use to fully automate upgrades by utilizing canary deployments and automatic roll-backs.

Image of sample GumGum drone.io pipeline
Image of sample GumGum drone.io pipeline
Fig 4 — Drone.io pipelines for provisioning and updating EKS clusters.

Application Deployment

We just started to dive below the surface, trying to understand the most extensive hidden parts of the Iceberg. With the great help of EKS, Terraform and Drone, we now have a shiny and brand-new fleet of Kubernetes clusters, ready to be loaded with the most complicated and heavy containers/applications. Hey, we just discovered a new piece of the Iceberg! How will we deploy, configure and update all the applications and microservices we want to run into our Kubernetes clusters? Running kubectl apply =f my-application.yaml should be more than enough for having your application up-and-running in a matter of seconds, sure, but if you have tens or hundreds of applications running on several Kubernetes clusters, across different environments (dev, staging, prod, etc.), then using your old and reliable kubectl friend is not enough. As for the Kubernetes cluster deployment problem, we needed to find a solution to this challenge. When we started this Kubernetes endeavor at GumGum, there were tens, if not hundreds, of different tools and methodologies for deploying applications to Kubernetes.

When picking the right tools to help you with Kubernetes, you can be, and you will be, overwhelmed by the vast landscape of open source and commercial applications, frameworks, tools, methodologies, and approaches available to help you solve your challenges. To give you an idea of the size of this vast space, the following screenshot is the most up-to-date “CNCF Cloud Native Interactive Landscape”, which contains 919 OSS projects backed by the CNCF (Cloud Native Computing Foundation). Although those projects are not strictly Kubernetes projects, all of them can be used in conjunction with Kubernetes, and I consider this landscape to be an indispensable resource when diving into a new Kubernetes-related technology. In addition to that landscape, there are thousands of opens-source projects related to Kubernetes somehow. Just searching the term “Kubernetes” on GitHub returned 80K results! Don’t feel intimidated by this vastness of tools and frameworks; use your judgment and instincts when picking a specific tool. If you are considering a particular tool for your Kubernetes stack, please consider one that is well supported and maintained by the community, with enough adoption and well-written documentation.

Image of all the players in the cloud native landscape. The image contains many many many logos.
Image of all the players in the cloud native landscape. The image contains many many many logos.
Fig 5: CNCF Landscape.

Putting that aside and going back to the main idea of application management and deployment, it is crucial to consider a tool or toolchain to help you with the building, packaging, distribution, deployment, and maintenance of your Kubernetes applications. Let’s crumble this a little bit so we can better digest the whole idea. As with any other mature software build and release process, most modern enterprises have adopted CI/CD practices and tools that significantly improve the speed and quality of the software delivered to the customers, and there is no novelty here. What we need to understand is how to translate those well-established patterns and methodologies to the Kubernetes land.

To tackle the challenge of packaging, deploying, and updating applications into a Kubernetes cluster, we decided to go with the previously de facto/industry-standard tool: Helm. As with any other tool and technology, there will always be fans and detractors of Helm. It is not about taking positions; it is about understanding your needs and the pros and cons before deciding what tool you’re going to implement for your Kubernetes stack. When we started implementing Kubernetes at GumGum, Helm v3 was just released, with whistles and bells, and because it was the industry-standard tool, it was a no-brainer decision to make. I am not going to deep dive into Helm, as that requires and deserves a lot of blog posts per se. Still, I need to emphasize that a huge community has grown around Helm, with many active charts being developed and and some great tooling has emerged like like helmfile and helmsman, which will significantly help your team with Helm adoption and maintenance.

The remaining part of the software build and delivery process corresponds to the actual orchestration required to properly perform CI/CD with the tool(s) of your choice, in our case, Helm. As with the other challenges, this one has tens of tools/frameworks/methodologies that will fill the void and then some; as I recommend, try not to be overwhelmed by this. Carefully assess your needs, what you want to provide to your engineering teams and what you can support and maintain in time. Let me just list some of the tools that are worth mentioning:

  • ArgoCD
  • Flux
  • Octopus Deploy
  • Harness
  • CircleCI
  • CodeFresh
  • Gitlab
  • Jenkins X

Monitoring & Logging

You are now becoming a Kubernetes Ninja; you’ve fully automated the deployment, management, and upgrading of your Kubernetes clusters and applications. A myriad of Kubernetes applications are running in tens or hundreds of Kubernetes clusters; that’s huge! But something is missing here: Your fleet of Kubernetes clusters is navigating blind. There’s a thick fog made of thousands of metrics and logs that don’t let you properly steer and maneuver your applications and clusters in a timely and proactive manner. You are the captain, and you feel like the guy in the picture below, staring at this massive panel of instruments, dials, levers, and counters that you kind of understand, but because of the amount and complexity of those metrics, you are not able to assess the actual state of the fleet of clusters. Your team asks many questions; but they mainly want to know what is going on with their precious applications running there. You want to provide the correct answers, so you need a comprehensive, integrated, and easy-to-deploy and manage Monitoring and Logging solution. I decided to merge these two critical categories into one: Observability, precisely what we want to accomplish with the fleet of Kubernetes clusters and the multitude of applications running in there.

Man staring at a panel of many levers and switches.
Man staring at a panel of many levers and switches.
Fig 6: Monitoring & Logging a complex system

Logging

Let’s start with the often more requested and used feature of the observability paradigm: logging. Logging encompasses the process of producing, collecting, filtering, aggregating, and searching logs across lots of Kubernetes clusters and applications. Piece of cake, right? Well, it’s not as simple as it seems. Certain challenges and features need to be considered when choosing, designing, and implementing your logging strategy.

First of all, how are your applications producing logs? You need to properly understand this to know how you are going to capture/collect them. There are mainly two ways to emit/produce logs: Writing to log files and writing to standard and error output. Most modern logging frameworks such as log4j, SLF4j, logback, and others, provide you with rich configurations that will let you decide the appropriate logging levels (INFO, ERROR, etc.) for the messages that your applications produce as well as the destination for those messages, either a log file or the standard error/output or both. Sending messages to the standard error/output should be the preferred way when working with containerized applications, as most of the logging solutions compatible with Kubernetes will plugin to the container’s standard output/error, looking for the logs to collect and process.

Second, your applications produce messages/logs, but here comes a new challenge, how is your logging solution deciding what to collect and what needs to be ignored or dropped? You probably don’t want to collect every single piece of log emitted by all the pods running in your Kubernetes clusters. Remember that in addition to your application Pods, there are a considerable amount of Kubernetes cluster Pods (i.e., kube-proxy) that you may or not may want to collect logs from, as the high verbosity of some of those components could quickly eat up a large chunk of your logging quota (epecially when paying for a SaaS logging solution). In addition to that, if you are running a semi-managed or fully-managed Kubernetes cluster as we do, the cloud provider will be “responsible” for the health of those pods/components, so don’t waste precious logging resources on them. For this particular purpose of discovering, collecting, and sending logs to your preferred logging provider, we are leveraging an excellent Kubernetes operator created and maintained by VMWare: kube-fluentd-operator. Based on the robust and reliable fluentd project, this operator installs and auto-configure fluentd agents that run as a Daemon Set, based on custom ConfigMaps, one per namespace. You can easily customize your fluentd config from the most elementary cases of just collect and send container logs, to more advanced uses cases where you want to collect multi-line messages, filter out or drop unwanted records, etc. The operator supports multiple input and output plugins that can ship logs to most logging solutions out there, including the one we use and love at GumGum: Sumologic. We have just scratched the surface with the kube-fluentd-operator, meaning that we are kind of primary users and have not leveraged its full potential as an advanced-yet-easy-to-use Kubernetes logging operator.

Fig 7 — Docker/container logging

Monitoring

The other side of the observability coin: monitoring. That powerful yet overlooked concept keeps developers and DevOps engineers wandering through many methodologies, best practices, tools, frameworks, and platforms that promise the monitoring nirvana; still, there is no such thing as that. It is impossible to have a perfect monitoring strategy; monitoring must be a continuous process looking for valuable insights and metrics for both the business and the IT stakeholders in a practical and cost feasible manner. Wow, I deserve a Nobel prize for that sentence. To be honest, the truth of monitoring is far away from that paragraph; most of the time, IT engineering teams tumble with monitoring. Large companies spend millions of dollars implementing fancy and huge monitoring platforms that may fail to deliver the promise to the end-users. Small and medium companies try to invest the least required time and money to achieve a decent monitoring platform, with mixed results. Regardless of the theory vs. the truth, monitoring applications running in a Kubernetes cluster, and Kubernetes itself, is usually more challenging than monitoring legacy/static infrastructure and applications. If you want to learn more about this, please look at this article that I created a while ago, discussing this matter.

Out of the box, a Kubernetes cluster comes with a minimal set of monitoring features, meaning that, as with the other challenges that we’ve been discovering down the road with Kubernetes, you need to provide the improved monitoring batteries to the Kubernetes cluster. The built-in monitoring mechanism, “metrics-server”, collects resource metrics from Kubelets and exposes them in the Kubernetes API server through Metrics API. Metrics-server is meant to be used mainly for HPA and VPA and as a source of metrics for more advanced monitoring platforms such as Prometheus.

Monitoring an entire fleet of Kubernetes clusters plus all the containers running on top of them requires a robust but flexible monitoring solution, which should tackle the following challenges:

  1. Service Discovery. The solution must be compatible with any third-party Service Discovery mechanism, i.e., Consul or tightly integrated with the Kubernetes API, to discover all the Kubernetes resources that need to be monitored: Pods, Deployments, Services, etc. This is crucial for a microservices architecture, where each Service can be distributed across multiple instances, and containers move across your infrastructure as needed.
  2. Cloud-native metrics explosion and scale requirements. Monitoring metrics cardinality has exploded. Kubernetes adds some multidimensional levels, like cluster, node, namespace, or Service, so the different aggregations or perspectives can explode.
  3. It should be capable of monitoring containers and get as many insights from them as possible. Containers are ephemeral in the Kubernetes world, so it means we need a tool capable of capturing all critical information and traces about them.
  4. We need a monitoring system that allows us to alert for high-level service objectives but retains the granularity to inspect individual components as required.

With these considerations in mind, now we can better understand why monitoring Kubernetes is very different from monitoring VMs and cloud instances.

Considering those requirements to properly monitor a Kubernetes cluster and its valuable load of containerized applications, we thought that the right fit for doing the job was Prometheus after evaluating some alternatives. Why is Prometheus the most common tool for containerized environments? These four characteristics made Prometheus the de-facto standard for Kubernetes monitoring:

  1. The multidimensional data model is based on key-value pairs, similar to how Kubernetes organizes infrastructure metadata using labels. It allows for flexible and accurate time-series data, powering its Prometheus query language.
  2. Accessible format and protocols: exposing Prometheus metrics is a straightforward task. Metrics are human-readable, in a self-explanatory format, and published using a standard HTTP protocol. You can check that the metrics are correctly exposed by using your web browser.
  3. Service discovery: the Prometheus server is in charge of periodically scraping the targets so that applications and services don’t need to worry about emitting data (metrics are pulled, not pushed). These Prometheus servers have several methods to auto-discover scrape targets. You can configure some of them to filter and match container metadata, making it an excellent fit for ephemeral Kubernetes workloads.
  4. Modular and high available components: metric collection, alerting, graphical visualization, etc., are performed by different composable services. All these services are designed to support redundancy and sharding.
Graph of how you use Kubernetes with Prometheus
Graph of how you use Kubernetes with Prometheus
Fig 8 — Kubernetes monitoring with Prometheus

We originally designed and implemented a self-hosted Prometheus and Grafana architecture that worked well when we started to ingest metrics from ECS and some other legacy EC2 platforms, but after a few billon metrics later, it began to show some signs of instability. The challenges and drawbacks of this implementation are mentioned in the above-mentioned article “Prometheus Monitoring at Scale: War Stories from the GumGum Trenches”. To overcome this challenge of hosting and scaling a HA Prometheus and Grafana platform, and since we are 100% AWS, we decided to give a try to of the most recent fully managed Service from Amazon: Amazon Managed Service Prometheus, a service which is still in preview, but because we are Prime customers of AWS, we got a chance to do an early PoC of it, and in addition to that, AWS is offering a fully managed service for the perfect monitoring companion for Prometheus: Amazon Managed Service for Grafana. As I write this article, we are still implementing both Amazon Managed Prometheus and Grafana, assessing the financial impact of a fully managed Prometheus stack and other technical considerations worth a future blog post. The only insights that we can provide right now are that we are leveraging the Prometheus Operator Helm Chart , a simple yet powerful way to install and configure a per-Kubernetes cluster Prometheus stack that discovers, scrapes, and filters all the Kubernetes and application metrics. Then it forwards all those metrics to the AWS-managed Prometheus stack. So, we are implementing the Prometheus Operator as some sort of proxy or federated Prometheus, used for the sole purpose of scraping, buffering and forwarding metrics to the AWS managed monitoring stack, which is more reliable, scalable, and fully managed.

We are far from reaching the monitoring nirvana, but are continuously improving our monitoring platform and strategy, making it more stable, affordable and empowering developers to create and deploy their own metrics, giving the most granular insight to the business.

HTTP Traffic Management

Fig 9 — Kubernetes Traffic with an Ingress

You are about to become a respectable Kubernetes captain. Your applications are up and running with proper monitoring and logging. There are CI/CD pipelines in place for deploying and updating your applications to your fleet of clusters. Yet, one question arises. How are the end users going to securely access your microservices and applications deployed to your Kubernetes clusters? To answer that question, we need to remember that a barebones Kubernetes cluster, whatever distribution or managed service you choose, does not include all the required batteries for exposing, securing, and routing the end-users’s HTTP requests to the right services and pods running in the cluster. A vanilla Kubernetes cluster offers Services, a high-level abstraction to expose an application/microservice running as a set of pods and load balance across them. I am not going to detail all the types of Kubernetes Services and their configurations, but one thing you need to know: Kubernetes Services on their own are not enough to securely expose and load-balance your applications and Pods.

SSL/TLS encryption is taken for granted and should be standard across most HTTP traffic, especially when facing external customers. This is one of the batteries that you need to choose and provide to Kubernetes. You need a reliable mechanism for delivering SSL and TLS certificates to the Kubernetes services and applications you expose to your internal/external users. Some cloud providers like AWS facilitate automated and straightforward ways for requesting and signing SSL and TLS certificates, so you can quickly secure your applications and endpoints. As with any other topic that we have discussed so far, SSL/TLS management for Kubernetes deserves many blog posts and pages, so we are not going to further detail into that. I just want to mention that there are tools that will help you with certificate management in cloud-native environments; one tool worth mentioning is cert-manager. Regarding Kubernetes certificate management at GumGum, we leverage AWS Certificate Manager, which we can freely use for provisioning SSL/TLS certificates that we can deploy to our ALB/ELB load balancers. So far, we are doing that kind of manually. We know beforehand what domain names will be managed by a certain ELB/ALB, so we request a wildcard ACM certificate with those domains (*.appA.gumgum.com, *.appB.gumgum.com, etc.). We know it is not the best or optimal solution, but please bear in mind that we are still growing and maturing our Kubernetes platform.

One additional piece of the Iceberg that becomes visible once you start deploying a lot of applications and microservices to a Kubernetes cluster is DNS management. Imagine this scenario: you enable your development teams to have ephemeral environments that are built and deployed to Kubernetes every time a developer pushes/commits to a branch. You will have many ephemeral applications that need to be exposed and accessed by the developers, and you don’t want to create a DNS record for every one of them manually. You need some sort of tool or automation that will make the proper DNS records for you. Here it comes: ExternalDNS, an amazing OSS that you can easily install and configure with a simple Helm Chart. ExternalDNS retrieves a list of resources (Services, Ingresses, etc.) from the Kubernetes API to determine the desired list of DNS records. Then it will create those DNS records in 3rd party DNS providers, like AWS Route53 or Google Cloud DNS.

Last but not least, the last part of the HTTP traffic management puzzle, the big dilemma of how you are going to expose your Pods, Deployments, ReplicaSets, or whatever Kubernetes resources you have created for your applications. We briefly mentioned that Kubernetes provides a high-level Load Balancer-like resource, the Service, which could be used for exposing applications with fundamental HTTP traffic management requirements, like a standalone microservice or a legacy web application. We have a couple of applications deployed like that, so we leverage Kubernetes services for them. Today, complex microservice architectures and deployments bring many challenges and complex HTTP traffic management requirements, such as HTTP path-based routing and load balancing, canary release support, traffic security policies, authentication, and authorization API Gateway, Service Mesh, traffic flow control, etc. Whenever you have an application with any of those requirements, the Kubernetes included batteries won’t be sufficient, and you will have to install and configure an Ingress Controller or Service Mesh. Deliberating around the best or appropriate IC/Service Mesh for your functional and non-functional requirements deserves a long conversation, assessment, and even some load testing. We went through all that at GumGum, when facing a dilemma with a microservice that attends around 50K requests per second and requires response latencies to be under 5ms. At that point, no Ingress Controller / Service Mesh was able to keep up with those non-functional requirements, so we ended up discarding the idea of an IC for that application and decided to use an AWS NLB as a high performance/low latency TCP load balancer, that can be easily created from a Kubernetes Service resource.

Don’t get me wrong, Ingress Controllers are amazing! We have other applications and microservices leveraging some of its features. My point here is that you shouldn’t try to force an Ingress Controller or Service Mesh pattern into every application or deployment just because it is the hype. Every application has its own functional and non-functional requirements. With the above application that I mentioned, forcing an IC pattern could be an over-engineering solution that could end up causing unexpected problems, like compute costs for processing requests.

More ice to discover

There are other remarkable chunks from this iceberg to be discovered, but my coffee cup is empty and my cigar is a stub. Keep an eye on this blog; I will be discussing other important matters like IAM (Identity and Access Management), stateful applications, secrets management, resource allocation, auto scaling, Kubernetes release upgrade, etc in an upcoming blog post.

Conclusions

  1. Kubernetes is the industry standard container orchestration engine, it should be a no-brainer for every company wanting to embark in a complex microservices architecture.
  2. Kubernetes provides, out of the box, a lot of great features like service discovery, health checks, autoscaling, scheduling, enhanced security, etc; but it is not enough when dealing with applications with advanced requirements and additional features.
  3. Keep in mind that the Kubernetes ecosystem could be a double sided sword: Just the CNCF is large enough to overwhelm any Developer or DevOps engineer wanting to adopt and create a robust fleet of Kubernetes clusters, but don’t be scared into inaction: try to pick-up tools with large adoption by the community.
  4. Defining the appropriate Kubernetes deployment strategy for your company should be the first decision to make; some of the remaining aspects to be considered, like monitoring, logging, upgrading, etc., will mostly build on this decision.
  5. Features like monitoring, logging, application deployment, should be designed and developed closely with the development teams, as they are stakeholders getting more value from those features and can help you make the necessary changes for your migration to Kubernetes.
  6. Kubernetes is not a silver bullet that will magically solve the design and code issues that your applications being migrated to K8S come with! But it might help expose them, which when dealt with properly will improve the reliability of your applications.

We’re always looking for new talent! View jobs.

Follow us: Facebook | Twitter | | Linkedin | Instagram