MLOps with Tensorflow Extended (TFX) and Tensorflow Decision Forest (TF-DF) (Part 1)

Yu (Robert) Fu
6 min readAug 29, 2021

--

Motivation

Not long ago, the Tensorflow team open-sourced Tensorflow Decision Forest(https://blog.tensorflow.org/2021/05/introducing-tensorflow-decision-forests.htm)l, a tree-based modelling package utilising Keras API within Tensorflow. This opened the door for making ML pipelines not just for deep learning models, but also much more interpretable tree models.

On the other side, as a practice engineer on machine learning and full-stack development, ever since sitting in the Google ML Engineer exam, I’ve been keen to learn more Tensorflow Extended (TFX) for MLOps, given its full spectrum of of standardised components, beam-powered distributed processing and production readiness. The combination of the two make it even harder to resist!

TFX workflow from https://www.tensorflow.org/tfx

Challenges

  1. As of writing this summary, TF-DF has yet released a distribution other than Linux, which means in order to run it locally in Mac or Windows, one has to setup a container, or play with it on Google Collabotory.
  2. Another challenge with running TF-DF is, given it’s not yet fully canonical, the integration with TFX may not be as smooth as other Keras models, and there seems not yet much documentation on integrating the two.

Goal

The goal was to find a way to locally test and learn the MLOps with TFX using TF-DF models. I was hoping to achieve the following:

  1. Use local environment for running TFX pipelines with TF-DF models.
  2. Debug the code if need to;
  3. Be able to use Tensorflow Model Analysis package and Tensorboard to visualise and evaluate the result(s);
  4. Can serve as the foundation to deploy on a local Kubeflow cluster for further test and learn.

Environment Setup

Base image

First step is to build a base image with minimal required packages based on Linux. This is currently the only platform that has a TF-DF distribution. All the downstream local development or deployment (ie Kubeflow) will be based on that. The pinned package versions are referred from https://pypi.org/project/tfx/. The image is then tagged and pushed to my docker hub as a public repo: https://hub.docker.com/r/robertf99/tfx-tfdf.

Development image

I also built a dev image for local development. Consider I am usingjupyterlab for tensorflow-model-analysis to run model analysis, I added the necessary extension according to the installation guide (see below #13)

A docker-compose.yaml is then created to spin up the local dev container with volumes mounted for all external pipeline files and data. This would allow for easy development as we wouldn’t need to rebuild the image often. The port mapping would allow access to the jupyterlabserver from localhost:8888, and also tensorboard to run within jupyterlab.

Note on Debugging

Using the dev image running in container, it is possible to connect to it via VScode (https://code.visualstudio.com/docs/remote/containers) and set up breakpoints as we usually do.

Now we are all set to move onto the real stuff!

TFX Pipeline(s)

Data and Model of Choice

I wanted to start simple by building a toy pipeline but run end-to-end with all the standard components TFX has to offer. For that I used the penguindataset (https://storage.googleapis.com/download.tensorflow.org/data/palmer_penguins/penguins.csv) and aimed to build a simple Random Forest model.

Pipeline Definition and Run

Following the examples from both TFX and TF-DF documentation, I was able to run the pipeline inside the development container and see pipeline_output updated in local file system. During the process I noticed:

  • Label column needs to be integer for it to be recognised as a Keras model;
  • There’s no need for one-hot encoding for categorical features;
  • Although TF-DF allows for missing values, however for TFX CSVExampleGen to refer the column type correctly, I had to remove all rows with blank value. This still needs a bit more investigation in future use cases.

I used a script to generate the processed data and saved it to the ./data folder.

So the complete pipeline looks like the following:

  • csvExampleGen — reads processed source data from file system, make train-eval split, and creates compressed tfrecord for training and evaluation;
  • statisticGen — reads training and evaluation data and compute descriptive statistics;
  • schema-importer — reads the schema generated from the first pipeline run
  • exampleValidator — determine if the new batch of data show anormalies as compared with previous batch
  • trainer — train the Random Forest model with training data and exports both the trained model as well as model runtime statistics which can be used for tensorboard evaluation.
  • evaluator — this is for calculating metrics for model evaluation, as well as comparing with the last run model to determine if the new model is to be promoted as BLESSEDbased on user-defined threshold.

A complete definitions of pipeline and model files can be found at https://github.com/robertf99/tfx-e2e

Pipeline Analysis

As all data scientists/ML engineers expect, I was keen to see how this toy pipeline had run. In the real-life scenario, I’d imagine someone would look at the pipeline output and examine the artifacts very closely in their day-to-day work.

The first step is to connect to the ML Metadata (MLMD) database to find the information about the lastest run. Under local environment the database is in sqlite, whereas in deployment it can be a MySql server.

After that I was able to run visualisations for some closer look of the artifacts. For example,

  • StatsGen for numeric features on train split
  • Tensorbard for lastest trained model
  • Fairness Indicator for data slices. In this case female penguin prediction accuracy is slightly higher than males. There are other metrics and data slices to compare as well (i.e., Do islands have different recall for penguin prediction?)
  • Some other metrics defined for pipeline evaluator artifact

An important aspect for MLOps is to monitor model performance over time. This is because when data drift occurs, previous model may lose its validity and thus may require re-train.

To simulate that, I can’t just use the artifact result from the latest evaluator. Because rather than evaluating the same model with new batch of data, what it does is evaluating a new model with (possibly) the same batch of data, and then determines whether to promot the candidate model as BLESSED.

To mimic new data incoming, I used the training and evaluation part of the lastestcsvExampleGensplits and run through Tensorflow Model Analysis (TFMA) with the same model (latest pushed model). Note I need to unzip the gz file and copy the tfrecord file to a tmp location. It also seems necessary to create a new EvalConfig instead of using the eval_config.jsongenerated from the evaluator artifact (please let me know if there’s a way to convert).

The result shows the “new” data (t2.tfrecord) has a higher overall accuracy and also higher recall and precision.

Conclusion and Takeaways

  • It is good to start out using Colab, then setting it up locally provides more insights of the different components, although version pinning is important to make sure everything works;
  • Some visualisation functionality of TFX or TFMA seem experimental and may introduce breaking changes in future;
  • Due to time constraint, I didn’t use Tensorflow Transform in the end-to-end toy pipeline;
  • The next step is to set up a local orchestrator (i.e, Kubeflow) to see how the toy pipeline would run in more production-like environment.

--

--