MLOps with Tensorflow Extended (TFX) and Tensorflow Decision Forest (TF-DF) (Part 1)
Motivation
Not long ago, the Tensorflow team open-sourced Tensorflow Decision Forest(https://blog.tensorflow.org/2021/05/introducing-tensorflow-decision-forests.htm)l, a tree-based modelling package utilising Keras API within Tensorflow. This opened the door for making ML pipelines not just for deep learning models, but also much more interpretable tree models.
On the other side, as a practice engineer on machine learning and full-stack development, ever since sitting in the Google ML Engineer exam, I’ve been keen to learn more Tensorflow Extended (TFX) for MLOps, given its full spectrum of of standardised components, beam-powered distributed processing and production readiness. The combination of the two make it even harder to resist!
Challenges
- As of writing this summary, TF-DF has yet released a distribution other than Linux, which means in order to run it locally in Mac or Windows, one has to setup a container, or play with it on Google Collabotory.
- Another challenge with running TF-DF is, given it’s not yet fully canonical, the integration with TFX may not be as smooth as other Keras models, and there seems not yet much documentation on integrating the two.
Goal
The goal was to find a way to locally test and learn the MLOps with TFX using TF-DF models. I was hoping to achieve the following:
- Use local environment for running TFX pipelines with TF-DF models.
- Debug the code if need to;
- Be able to use Tensorflow Model Analysis package and Tensorboard to visualise and evaluate the result(s);
- Can serve as the foundation to deploy on a local Kubeflow cluster for further test and learn.
Environment Setup
Base image
First step is to build a base image with minimal required packages based on Linux. This is currently the only platform that has a TF-DF distribution. All the downstream local development or deployment (ie Kubeflow) will be based on that. The pinned package versions are referred from https://pypi.org/project/tfx/. The image is then tagged and pushed to my docker hub as a public repo: https://hub.docker.com/r/robertf99/tfx-tfdf.
Development image
I also built a dev image for local development. Consider I am usingjupyterlab
for tensorflow-model-analysis
to run model analysis, I added the necessary extension according to the installation guide (see below #13)
A docker-compose.yaml
is then created to spin up the local dev container with volumes mounted for all external pipeline files and data. This would allow for easy development as we wouldn’t need to rebuild the image often. The port mapping would allow access to the jupyterlab
server from localhost:8888
, and also tensorboard to run within jupyterlab
.
Note on Debugging
Using the dev image running in container, it is possible to connect to it via VScode (https://code.visualstudio.com/docs/remote/containers) and set up breakpoints as we usually do.
Now we are all set to move onto the real stuff!
TFX Pipeline(s)
Data and Model of Choice
I wanted to start simple by building a toy pipeline but run end-to-end with all the standard components TFX has to offer. For that I used the penguin
dataset (https://storage.googleapis.com/download.tensorflow.org/data/palmer_penguins/penguins.csv) and aimed to build a simple Random Forest model.
Pipeline Definition and Run
Following the examples from both TFX and TF-DF documentation, I was able to run the pipeline inside the development container and see pipeline_output
updated in local file system. During the process I noticed:
- Label column needs to be
integer
for it to be recognised as a Keras model; - There’s no need for one-hot encoding for categorical features;
- Although TF-DF allows for missing values, however for TFX
CSVExampleGen
to refer the column type correctly, I had to remove all rows with blank value. This still needs a bit more investigation in future use cases.
I used a script to generate the processed data and saved it to the ./data
folder.
So the complete pipeline looks like the following:
- csvExampleGen — reads processed source data from file system, make train-eval split, and creates compressed
tfrecord
for training and evaluation; - statisticGen — reads training and evaluation data and compute descriptive statistics;
- schema-importer — reads the schema generated from the first pipeline run
- exampleValidator — determine if the new batch of data show anormalies as compared with previous batch
- trainer — train the Random Forest model with training data and exports both the trained model as well as model runtime statistics which can be used for tensorboard evaluation.
- evaluator — this is for calculating metrics for model evaluation, as well as comparing with the last run model to determine if the new model is to be promoted as
BLESSED
based on user-defined threshold.
A complete definitions of pipeline and model files can be found at https://github.com/robertf99/tfx-e2e
Pipeline Analysis
As all data scientists/ML engineers expect, I was keen to see how this toy pipeline had run. In the real-life scenario, I’d imagine someone would look at the pipeline output and examine the artifacts very closely in their day-to-day work.
The first step is to connect to the ML Metadata (MLMD) database to find the information about the lastest run. Under local environment the database is in sqlite, whereas in deployment it can be a MySql server.
After that I was able to run visualisations for some closer look of the artifacts. For example,
- StatsGen for numeric features on train split
- Tensorbard for lastest trained model
- Fairness Indicator for data slices. In this case female penguin prediction accuracy is slightly higher than males. There are other metrics and data slices to compare as well (i.e., Do
islands
have different recall for penguin prediction?)
- Some other metrics defined for pipeline evaluator artifact
An important aspect for MLOps is to monitor model performance over time. This is because when data drift occurs, previous model may lose its validity and thus may require re-train.
To simulate that, I can’t just use the artifact result from the latest evaluator
. Because rather than evaluating the same model with new batch of data, what it does is evaluating a new model with (possibly) the same batch of data, and then determines whether to promot the candidate
model as BLESSED
.
To mimic new data incoming, I used the training and evaluation part of the lastestcsvExampleGen
splits and run through Tensorflow Model Analysis (TFMA) with the same model (latest pushed model). Note I need to unzip the gz
file and copy the tfrecord
file to a tmp location. It also seems necessary to create a new EvalConfig
instead of using the eval_config.json
generated from the evaluator
artifact (please let me know if there’s a way to convert).
The result shows the “new” data (t2.tfrecord
) has a higher overall accuracy and also higher recall and precision.
Conclusion and Takeaways
- It is good to start out using Colab, then setting it up locally provides more insights of the different components, although version pinning is important to make sure everything works;
- Some visualisation functionality of TFX or TFMA seem experimental and may introduce breaking changes in future;
- Due to time constraint, I didn’t use Tensorflow Transform in the end-to-end toy pipeline;
- The next step is to set up a local orchestrator (i.e, Kubeflow) to see how the toy pipeline would run in more production-like environment.
Reference
- TF-DF release dicussion for Mac: https://github.com/tensorflow/decision-forests/issues/16
- TF-DF with TFX discussion: https://discuss.tensorflow.org/t/tensorflow-decision-forests-with-tfx-model-serving-and-evaluation/2137/3
- TF-DF tutorial: https://www.tensorflow.org/decision_forests/tutorials/beginner_colab
- TFX tutorial: https://www.tensorflow.org/tfx/tutorials
- Coursera Machine Learning Modeling Pipelines in Production Wk 4: https://www.coursera.org/learn/machine-learning-modeling-pipelines-in-production/home/week/4