Lyft’s End-to-End ML Platform

Lyft:Flyte

Vimarsh Karbhari
Acing AI

--

At Lyft, developing large-scale, complex ML & Data pipelines was getting more and more difficult as ML infrastructure and data flowing through them started getting more and more complex. Data and machine learning were in essence converging within Lyft. Hence, it became an increasing need for having a single tool for both. Soon after building such a tool, Lyft has open-sourced this cloud-native platform called Flyte, where data and machine learning operations converge.

Flyte Architecture

Why?

Lyft build Flyte to orchestrate ML & Data Workflows. In addition, it was important to collaborate, reuse, and perform ML Ops across teams. With these over-arching points, Flyte aims to be a hosted, scalable and serverless orchestration platform fabric that connects disparate compute technologies.

How?

Flyte is built on top of Kubernetes. Since it is used internally by Lyft, it scales to at least 7,000 unique workflows with over 100,000 executions every month, 1 million tasks, and 10 million containers.

All entities in Flyte are immutable, so it is possible to track data lineage, reproduce the experiments, and roll back deployments. Repeated tasks can leverage the task cache to save time and money. Currently supported tasks include Python, Hive, Presto, and Spark as well as…

--

--