Tensorflow model tracking with MLflow

Arpit Kapoor
Analytics Vidhya
Published in
6 min readAug 16, 2020

Developing a machine learning model is an iterative process consisting of multiple steps such as — model selections, model training, hyperparameter tuning, and deploying model into production. Tracking the model through these stages in an organized way helps in tracking various issues like —small changes in data, code, or hyperparameters that affect the overall model performance. But model tracking can be a non-trivial task that may get messy at times. MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Although tensorflow has it’s own model tracking tool — tensorboard, mlflow provides a simpler interface for tracking the experiments, while also making it easier to push the trained model into production.

Here, I will demonstrate how mlflow can be used to track tensorflow models using a remote tracking store. MLflow supports various tracking backend store. I will be using a MySQL database to store the experiments and model artifacts.

Setting Up MySQL server

First we need to set up a mysql server. I will be using a Docker container to start the server on my local machine. You can pull the mysql-server docker image by running the following command:

shell> docker pull mysql/mysql-server:latest

Once the docker image is pulled, we can create a docker container from this image:

shell> docker run --name=mysqldb -p 3306:3306 -p 33060:33060 -d mysql/mysql-server:latest

We map the ports 3306 and 33060 to the docker container so that we can access the database outside of the docker container. MySQL by default uses the port 3306 for access. We can see the details of the running container with docker ps command.

Now that our MySQL server is running, let’s do some access configurations for accessing the mysql server from outside the container. First, let’s configure the password for the root user. To do that we will need the automatically generated password for root.

shell> docker logs mysql1 2>&1 | grep GENERATED
GENERATED ROOT PASSWORD: Axegh3kAJyDLaRuBemecis&EShOs

We will now run mysql command from inside the docker to access the mysql command shell as root:

shell> docker exec -it mysql1 mysql -uroot -p

Mysql will ask for the password, you must enter the password generated by mysql which we saw earlier. After this, we can change the password for the root user. Replace the string ‘password’ with the actual password that you want to set. To make the access available for the root user from outside the container, we will update the host value from ‘localhost’ to ‘%’.

mysql> alter user 'root'@'localhost' identified by 'password';
mysql> update mysql.user set host = ‘%’ where user=’root’;

One last step, we will create a database called mlflow that mlflow will use to track the experiments and models.

mysql> create database mlflow

That’s it, now we should be able to access the mysql server that we just set up from outside the docker container. You can test the connection using: MySQL Workbench. I am not going to go over that in this post but it is pretty straight forward.

Writing Tensorflow training code

If you don’t have tensorflow and mlflow installed, then both these packages can be installed using pip install command:

shell> pip3 install tensorflow
shell> pip3 install tensorflow-datasets
shell> pip3 install mlflow
shell> pip3 install mysqlclient

I will be demonstrating the model training using Tensorflow’s keras API by training a simple image classification model on MNIST dataset.

Let’s get to the code.

We will start by importing the required python modules:

import tensorflow as tf
import tensorflow_datasets as tfds
import mlflow

We use the set_tracking_uri() method to tell mlflow, where to store the training logs. The URI can be either a ‘/path/to/local/store’ or a SQLAlchemy databse URI:

user = 'root'
pwd = 'password'
hostname = 'localhost'
port = 3306
database = 'mlflow'
uri = 'mysql://{user}:{password}@{hostname}:{port}/{databse}'mlflow.set_tracking_uri(uri)

Mlflow stores all the runs under ‘default’ experiment name, by default. We can assign an experiment name by using the set_experiment() method and start_run() method to create a run in this experiment.

mlflow.set_experiment(‘MNIST’)
mlflow.start_run(run_name=”Run_1")

Let’s load the data for training and validating the model:

(ds_train, ds_test), ds_info = tfds.load(
‘mnist’,
split=[‘train’, ‘test’],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
# Normalizes images: `uint8` -> `float32`
def normalize_img(image, label):
return tf.cast(image, tf.float32) / 255., label
# Train Dataset
ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits[‘train’].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
# Test Dataset
ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

Now, let’s define a neural network model using the keras API:

# Define the layers
inputs = tf.keras.Input(shape=(28, 28, 1))
hidden = tf.keras.layers.Flatten()(inputs)
hidden2 = tf.keras.layers.Dense(128, activation=’relu’)(hidden)
outputs = tf.keras.layers.Dense(10, activation=’softmax’)(hidden2)
# Optimizer
opt = tf.keras.optimizers.Adam(learning_rate=0.002)
# Create a Model object
model = tf.keras.Model(inputs, outputs)
# Compile the model
model.compile(optimizer=opt, loss=’sparse_categorical_crossentropy’, metrics=[“accuracy”])

MLflow comes with strong bindings for major deep learning frameworks, including, but not limited to Tensorflow, PyTorch, Gluon and XGBoost. These bindings provide autolog feature which logs the model training automatically to mlflow run. This makes it super convenient to log all the hyperparameters, metrics, and even the trained model while training.

import mlflow.tensorflow
mlflow.tensorflow.autolog(every_n_iter=2)

autolog takes one parameter, every_n_iter, which is the number of training epochs between every log of the training metrics. For example, if the value passed is 2, mlflow will log the training metrics (loss, accuracy, and validation loss etc.) every 2 epochs.

Now, we can just can use the model.fit() method to train our deep learning model.

model.fit(ds_train, epochs=100, validation_data=ds_test,
batch_size=128)

Once the model training is complete, we can end the run by calling:

mlflow.end_run()

And that’s it for the training code. We can now proceed to tracking the model and in metrics in the MLflow UI.

Tracking in MLflow UI

MLflow web UI can be started using the mlflow ui command. We will pass an additional parameter — backend-store-uri, which is nothing but the URI of the database from which we want mlflow to load the experiments. We use the same MySQL database URI as earlier:

shell> mlflow ui --backend-store-uri ‘mysql://root:password@localhost:3306/mlflow’

The mlflow UI can be accessed at: http://localhost:5000.

In Experiments tab, you should be able to see our MNIST experiment. On clicking it, you can see all the runs for this experiment. MLflow gives a brief summary of each run on the experiment page. More detailed logs can be found on individual run page.

MLflow Experiment tracking UI

MLflow also provides an automated plot generation capability for different metrics with many available customizations. Here, we visualize the model training loss.

Metric Plotting

Finally, mlflow also provides model registry. Although, this feature only works when we use remote tracking (like how we use MySQL) instead of local tracking. To register a model, all one needs to do is go to model artifacts on the run page and click on register model, after selecting the model you want to register.

Models registered under same name are automatically versioned.

Conclusion

In this post, I have tried to cover the basics of how tensorflow models can be tracked using mlflow. I believe mlflow is an excellent tool for end-to-end machine learning model lifecycle tracking. Model Registry and deployment capability makes mlflow a convenient bridge between model development and deployment.

--

--

Arpit Kapoor
Analytics Vidhya

Machine Learning | Data Anomaly Detection | Computer Vision | Robotics |