An introduction to Machine Learning Operation (MLOps) using Kubernetes
Why build machine learning models if they are just going to sit on your Jupyter notebook without getting into production. I mean why train athletes if they are just going to sit in their hostel rooms. Most ML models never make it to production, mostly because of operational difficulties or because the models are not consistent with applicable regularizations and laws.
ML Operations, or MLOps, is the process of operationalizing data science by getting your ML models into production. Operationalizing models means not only reducing their friction in the deployment of the pipeline but also embedding them in a business system. Frequently, the environment in which models are developed is quite different from the environment in which they are ultimately deployed. The integration of predictive models into external systems is an area that is not only complex but also less standardized than other aspects of the ML life cycle. MLOps enables you to track / version / audit / certify / re-use every asset in your ML lifecycle.
As datasets continue to expand and models become more complex, distributing machine learning (ML) workloads across multiple nodes is becoming more attractive. Unfortunately, breaking up and distributing a workload can add both computational overhead and a great deal more complexity to the system. Data scientists should be able to focus on ML problems, not DevOps. Fortunately, distributed workloads are becoming easier to manage, thanks to Kubernetes.
Kubernetes is a mature, production-ready platform that gives developers a simple API to deploy programs to a cluster of machines as if they were a single piece of hardware. Using Kubernetes, computational resources can be added or removed as desired, and the same cluster can be used to both train and serve ML models.
This may sound like gibberish if you are not familiar with Kubernetes. The aim of this article is to help you understand the basics of Kubernetes.
Say we want to build blocks, and these blocks are all of different sizes and shapes, some melt at a particular temperature while some dissolves in water. These blocks need to be transported to a different location, but unfortunately, we can only transport all of them using one van.
This is how deployment feels like in traditional deployment, you have different technologies like a Web Server using NodeJS and a database such as MongoDB/CouchDB, a messaging system like Redis, and an orchestration tool like Ansible with different dependencies and libraries. You will also have to ensure that all these different services are compatible with the version of the OS you plan to use. There are times when a certain version of these services would not be compatible with the OS you use, and you will have to go back and look for another OS that is compatible with all of these different services. That is when you start hearing the very popular phrase:
But, it worked on my own machine
Containers solve the problem of dependency and compatibility issues. It simply put all the blocks in different containers, so that each block is completely isolated from rest, that way the temperature of the van does not affect the blocks.
So what are containers? A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and deploy it as one package.
Container runtime is a software that runs containers. The most popular container runtime is Docker.
Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings.
Container images become containers at runtime and in the case of Docker containers — images become containers when they run on Docker Engine. Available for both Linux and Windows-based applications, containerized software will always run the same, regardless of the infrastructure.
So we learned about containers and we now have our application packaged into a docker container. But what next? How do you run it in production? What if your application relies on other containers such as a database or messaging services or other backend services? What if the number of users increases and you need to scale your application? You would also like to scale down when the load decreases. From our block analogy what if the numbers of blocks increase?
To enable these functionalities you need an underlying platform with a set of resources. The platform needs to orchestrate the connectivity between the containers and automatically scale up or down based on the load. This whole process of automatically deploying and managing containers is known as Container Orchestration.
Kubernetes is thus a container orchestration technology. There are multiple of such technologies available today — Docker has its own tool called Docker Swarm. Kubernetes from Google and Mesos from Apache. While Docker Swarm is really easy to set up and get started, it lacks some of the advanced autoscaling features required for complex applications. Mesos on the other hand is quite difficult to set up and get started but supports many advanced features.
Kubernetes — arguably the most popular of it all — is a bit difficult to set up and get started but provides a lot of options to customize deployments and supports deployment of complex architectures. When we say that Kubernetes auto-scales it simply means that Kubernetes automatically creates multiple instances of your application when the number of users increases and reduces the instances when the number of users decreases. It also increases the number of nodes in the cluster when the nodes can no longer accommodate additional containers. If a node in a cluster fails, Kubernetes automatically moves the workload of that node to a different node in the cluster. Your application is now highly available as hardware failures do not bring your application down because you have multiple instances of your application running on different nodes.
Kubernetes is now supported on all public cloud service providers like GCP, Azure, and AWS, and the Kubernetes project is one of the top-ranked projects in Github.
Containers are not just deployed by Kubernetes, they are encapsulated into a Kubernetes object known as PODs. A POD is a single instance of an application and the smallest object that can be created in Kubernetes.
A node is simply a machine that can be either physical or virtual in which a Kubernetes software and its tools are installed. It is also where the containers will be launched. What if the node fails? All the containers in that node will become inaccessible. This why we need more nodes — a cluster
A cluster is simply a set of nodes grouped together so that if one node fails, its workload can be moved to another node. Thanks to Kubernetes we do not have to do that manually. But what manages the nodes?
The master node is the node with the Kubernetes control pane components installed. It watches over all other nodes in the cluster, and it is responsible for the actual orchestration.
Kubeflow, an open-source project which aims to make running ML workloads on Kubernetes simple, portable and scalable. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. It also extends the Kubernetes API by adding new Custom Resource Definitions (CRDs) to your cluster, so machine learning workloads can be treated as first-class citizens on Kubernetes.
I hope this will help you get started with MLOps using Kubernetes ✌️.