Metaflow by Netflix — the good, the bad and the ugly

Syed Sadiq Ali Naqvi
Analytics Vidhya
Published in
4 min readMar 2, 2020

In December 2019, Netflix open-sourced metaflow — a framework to write human-centric machine learning infrastructure for Python. The goal is to abstract away the management and engineering side of ML modelling and let the data scientist (who is arguably not too good at DevOps and coding best practices) focus on Data Science. It is extensively used by the data science team at Netflix.

The adoption rate of Metaflow at Netflix Data Science is rocketing!

But what is Metaflow?! According to the official docs,

Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.

Metaflow provides a unified API to the infrastructure stack that is required to execute data science projects, from prototype to production.

Credits: https://www.slideshare.net/InfoQ/humancentric-machine-learning-infrastructure-netflix

Metaflow introduces an API where each ML pipeline is written as a “Flow”.

A Flow is just like a Directed Acyclic Graph (DAG). It can have steps (nodes), branches and data artefacts like parameters, files, etc.

A typical (meta)flow! Credits: Metaflow Launch

To get a more detailed understanding of these concepts, refer to this article which does a great job at explaining these components. Also, the official docs are also very helpful.

Recently, I used metaflow in one of my projects. Here’s an opinionated list of its pros and cons.

The Good

  • DAGs give the right model
    Most (ML) pipelines can (and should) be thought of as DAGs. You’ve some preprocessing and postprocessing and in between sits a model. Metaflow reiterates this framework.
  • Similar to Airflow
    Airflow is a popular data pipelining tool used for orchestration of ETL pipelines. Metaflow enables you to think about ML pipelines in a similar fashion.
  • Built-in support for parallelism
    If you have multiple “steps” or functions in your flow that can be run in parallel, Metaflow gives out-of-the-box support for them, utilizing your compute efficiently.
  • Object-Oriented Programming
    Say goodbye to managing scripts with so many functions.
    Welcome a class with steps (as those functions) organized via a logical graph.
  • Extensive logging and versioning
    One of the best things about Metaflow is, it logs every run of each of your Flows. This makes the debugging of workflows in production environments a breeze. Not only you get the logs and errors that might have occurred at any step in the flow, but also you get the latest_run which you can use to restore the last run’s state.
Logs of each run of a flow
Logs of each step of a run
  • Readability
    Possibly the greatest advantage of using Metaflow — it just makes your ML code so readable. The abstraction that comes with steps is great. Also, you can generate the documentation of a Flow without even running/peeking inside the code. Just run:
python my_flow.py show
Generating docs using Metaflow’s show command

Pretty good, huh?

The Bad (and The Ugly)

  • Limited feature set and documentation
    Being open-sourced in Dec 2019, the library is still in very active development. For now, the documentation is somewhat limited and you have to rely a lot on tutorials or even peek inside the library code. That being said, machine learning infrastructure management is evolving very fastly and the team is doing it’s best to push the needed features.
  • Not supported on Windows
    You cannot currently run Metaflow on Windows. However, if you’re familiar with Docker, then this shouldn’t be a problem. Just make sure you set up a user name in your Docker file as Metaflow expects a user name for logging purposes
# User needed for metaflow loggingENV USERNAME my_user
  • Tightly integrated with AWS
    Netflix’s partnership with AWS is well-known. Some people even say that the content streaming platform had a big hand in making the cloud platform what it is today. So naturally, Metaflow comes with built-in support for AWS S3 buckets, EC2 compute and other cloud features. However, there’s talk of the support for Azure coming soon.
  • Invoke via subprocess
    Possibly the most annoying limitation of Metaflow — you can’t initiate a run by creating an object of the flow class. It has to be run via terminal. This shouldn’t be a problem if you have a queue which triggers runs. However, if you have a server serving the requests, this could be ugly. In python, running a terminal command from code is usually done via the subprocess method, which is also the recommended way by the Metaflow authors.
subprocess.run(['python', 'my_flow.py', 'run'])

And that’s all! While Metaflow seems to be a step in the right direction, there’s still a long way to go until it is ready for universal use. Thank you for reading!

--

--

Syed Sadiq Ali Naqvi
Analytics Vidhya

AI enthusiast, FIFA freak, Software Engineer — in that order.