Decoding Netflix: Metaflow

Metaflow, Netflix’s human-centric framework for data science.

Vimarsh Karbhari
Acing AI

--

Data scientists like to retain their freedom to use arbitrary, idiomatic Python code to express their business logic — like they would do in a Jupyter notebook. They do not want to be caged in to software architecture constructs. To solve this problem, Netfix built Metaflow. This infrastructure should allow them to exercise their freedom as data scientists but it should provide enough guardrails and scaffolding, so they don’t have to worry about software architecture too much.

Metaflow <> AWS

Netflix’s data warehouse contains hundreds of petabytes of data. While a typical machine learning workflow running on Metaflow touches only a small shard of this warehouse, it can still process terabytes of data.

Metaflow is a cloud-native framework. It leverages elasticity of the cloud by design — both for compute and storage. Netflix has been one of the largest users of Amazon Web Services (AWS) for many years. They have accumulated plenty of operational experience and expertise in dealing with the cloud, AWS in particular. When they open sourced Metaflow, they partnered with AWS to provide a seamless integration between Metaflow and various AWS services.

--

--