From Research to Production with TFX Pipelines and ML Metadata

May 24, 2019 · 3 min read

Posted by Jarek Wilkiewicz on behalf of the TFX team

If your code runs in production, you probably are already familiar with version control / software configuration management (SCM), continuous integration and continuous deployment (CI/CD) as well as many other software engineering best practices. These took years to develop and now we often take them for granted. Much like how writing an efficient algorithm implementation is just the beginning a software engineer’s journey, machine learning (ML) model code typically represents only 5% of the overall system¹ required to deploy it to production. At Google, we’ve also been working on improving the remaining 95% over many years². A fruit of our labour, TensorFlow Extended (TFX³), aims to introduce the benefits of software engineering discipline to the fast growing space of ML. In an upcoming series of blog posts, we’ll highlight what’s new in TFX and show you how TFX can help you build and deploy your ML models to production environments.

Until recently only the underlying TFX libraries (TensorFlow Data Validation, TensorFlow Transform, TensorFlow Model Analysis, TensorFlow Serving) were available in open source which meant that developers still had to build their own ML pipeline components using the libraries. You can now create a complete TFX ML pipeline using several ready-made components, configure them for many typical ML use cases with a high level Python API, and execute them with an orchestration system of your choice such as Apache Airflow or Kubeflow as shown in the figure below.

Image for post

When the TFX pipeline executes, ML Metadata (MLMD, another Google open source project) keeps track of artifacts pipeline components depend upon (e.g. training data) and produce (e.g. vocabularies and models). ML Metadata is available as a standalone library and has also been integrated with TFX components for your convenience. MLMD allows you to discover the lineage of an artifact (for example what data a model was trained on), find all artifacts created from an artifact (for example all models trained on a specific dataset), and enables many other use cases.

To better understand how this all fits together check out the Google I/O ’19 presentation: “TensorFlow Extended (TFX): Machine Learning Pipelines and Model Understanding”.

In our next TFX blog post, we will describe the TFX pipeline components in more detail. Until then, please try the TFX developer tutorial. You’ll follow a typical ML development process, starting by examining the dataset, and ending up with a complete working ML pipeline. If you have TFX questions please reach us on Stack Overflow, bug reports and pull requests are always welcome on GitHub, and we invite general discussion at

[1] Sculley, D., Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo and Dan Dennison. “Hidden Technical Debt in Machine Learning Systems.” NIPS (2015).

[2] Tushar Chandra, “Sibyl: A System for Large Scale Machine Learning at Google”, IEEE DSN (Dependable Systems and Networks) conference keynote, Atlanta, GA, June 25th 2014.

[3] Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, and Martin Zinkevich. 2017. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘17). ACM, New York, NY, USA, 1387–1395. DOI:


TensorFlow is an end-to-end open source platform for…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store