Building scalable and efficient ML Pipelines

Using Kubernetes to build ML pipelines that scale

Vimarsh Karbhari
Acing AI

--

Kubernetes is the gold standard for managing tons of containerized applications, whether they are in the cloud or on your own hardware. Whether it is pipeline building, model building or ML application building Kubernetes enable containerization which is a safe way to build and scale any of these scenarios.

Kubernetes can host several packaged and pre-integrated data and data science frameworks on the same cluster. These are usually scalable or they auto-scale, and they’re defined/managed with a declarative approach: specify what your requirements are and the service will continuously seek to satisfy them, which provides resiliency and minimizes manual intervention.

KubeFlow is an open source project that groups leading relevant K8 frameworks. KubeFlow components include Jupyter notebooks, KubeFlow Pipeline (workflow and experiment management), scalable training services (utilized for TensorFlow, PyTourch, Horovod, MXNet, Chainer) and model serving solutions. KubeFlow also offers examples and pre-integrated/tested components.

In addition to typical data science tools, Kubernetes can host data analytics tools such as Spark or Presto, various databases and monitoring/logging solutions such as Prometheus, Grafana and Elastic Search as well. It also enables the use of serverless functions (i.e. auto built/deployed/scaled code like AWS Lambda) for a variety of data-related tasks or…

--

--