This is part of multi storey write-up about our journey into kubernetes.
Most of you who are working in technical landscape would have heard about Kubernetes unless you are meditating in deep forest. As per the official documentation, Kubernetes is a portable, extensible open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. To explain this is lay-man terms, Kubernetes is container orchestrator that takes care of container life-cycle operations.
Kubernetes is very popular right now and provides a lot of exciting prospects. To name a few, it provides engineers the capability of not needing to know about infrastructure paradigms to deploy their software, provides scaling in and out in an automated manner. As with any distributed system, kubernetes comes with its fair share of challenges. Its hard to setup and manage kubernetes. In order to do this successfully, one should have a solid understanding of how kubernetes operates and configuration of kuberentes as per ones’ requirement is a tedious task.
In this post, we will talk about why we chose to work with Kubernetes at omni:us. We will be going through technical aspects of the design behind our Deep learning training, annotation and prediction services on top of kubernetes.
Every infrastructure project starts with business need. When we started, our goal was to provide ML/AI services to our customers. Along with ML/AI services, we knew there are going to be engineering services around them. From the beginning, we wanted to support multi cloud / on-premise infrastructure as we knew our customers were going to be across the spectrum.
We chose kubernetes for below reasons.
- Our customers infrastructure can be within datacenters or public cloud providers. We should be ready to deploy our services.
- It’s no brainer that best way to ship softwares at this time and age are containers. Hence, our chosen platform should run containers.
- Engineers and Data scientists can work without worrying about infrastructure for their deployments.
- Chosen platform should support scale-out (Adding more nodes on the fly) and scale-in (Removing nodes on the fly) as and when required.
- It should be open-source with good community around it.
- There should be enough engineering resources available to manage the chosen product as that is most important given that customers may manage services themselves.
Kubernetes had a tick mark for all the above reasons and with the ever growing kubernetes community, we never had any doubt over it’s survival after few years. Keep in mind that above points helped us in short-listing suitable alternatives.
And just like any other infrastructure project, we did a thorough check on how our services operate and how we ship services before finalising kubernetes. If you’re considering Kubernetes, keep in mind: don’t use Kubernetes just because other companies are using it. Do a thorough Proof of Concept and see if it is useful for you.
Kubernetes components are already explained in detail in kubernetes documentation and many other posts. We are not going to touch them again here. However, below is a short overview of multiple components and execution related services. Links are provided next to them for further reading.
Kubernetes components (running on master and node), https://kubernetes.io/docs/concepts/overview/components/
Network Policies — https://kubernetes.io/docs/concepts/services-networking/network-policies/
Provisioning Kubernetes Cluster
Kubernets being distributed system is not easy to provision or setup. There are many configuration parameters that one need to understand for setting it up as per companies own requirements.
One can setup kubernetes in a hard way. This enables one to understand components, various services kubernetes master and worker node use for customising according to one’s requirements (for eg: SSL certificate authority etc.). This also helps in troubleshooting any issues that arise in kubernetes cluster.
Fortunately, kubernetes community has developed many tools that made installation easier such as kubeadm (works across all infrastructures irrespective of on-premise, cloud providers), kops (AWS, GCP(beta)), kubespray (all cloud providers and on-premise infra).
And in addition to setting up kubernetes, many cloud providers provide managed version of kubernetes. GCP has Google Kubernetes Engine, AWS has Elastic Kubernetes Service, Azure has Azure Kubernetes Service to name few major cloud providers. All the managed versions has very good integrations with various cloud providers own services. In our view GKE is leading the pack with other not too far behind.
GKE can be provisioned using gcloud (Google Cloud SDK) or terraform.
(In case, if some one don’t know Terraform, Terraform is is an open-source infrastructure as code software tool that enables increase in productivity and reduce configuration drifts.)
AKS can be provisioned using az (Azure SDK) or terraform.
EKS is designed by AWS in a unique way wherein master and autoscaling group are provisioned independently. If aws cli is used, all these operations should be performed independently. Weaveworks has developed very good tool, “eksctl” for doing all these operations with one command. Or terraform can also be used for doing this.
And in the next post, we will discuss on exactly how we are using Kubernetes for various ML tasks like Deep learning training, annotation and prediction.