MLflow logging for TensorFlow

Sumeet Gyanchandani
Analytics Vidhya
Published in
3 min readNov 4, 2019

This is the third article in my MLflow tutorial series:

  1. Setup MLflow in Production
  2. MLflow: Basic logging functions
  3. MLflow logging for TensorFlow (you are here!)
  4. MLflow Projects
  5. Retrieving the best model using Python API for MLflow
  6. Serving a model using MLflow

While the Basic MLflow logging functions are all you need to get started with MLflow. This guide will help with the initial issues which one might face while using MLflow with TensorFlow.

Let us start with some basic MLflow functions that will help you log various values and artifacts.

import mlflow

Logging functions need to be associated with a particular run. The best way to get everything into a single run is specifying the start of the run at the start of the main function (or some other calling function) and end of the run at the end of the main function.

if __name__ == '__main__':
mlflow.start_run()
#the model code
mlflow.end_run()

mlflow.log_param() logs a single key-value param in the currently active run. The key and value are both strings.

In TensorFlow, if you log the Tensor directly, only the “type” of the tensor would be logged. For instance:

Logging Tensor Type

Instead, we need to log its value:

Logging Tensor Value

To achieve this you will need to use the function .eval() on your Tensor. To be able you use .eval() you need to be within a session and the tensor should be initialized. If you are not within a session or you do not want to use the default session for logging, you can always start a hierarchical session anywhere in the code with the following statements.

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)) as newsess:
newsess.run(tf.global_variables_initializer())
mlflow.log_param('learning_rate', learning_rate.eval())

mlflow.log_metric() logs a single key-value metric. The value must always be a number. Similarly to mlflow.log_param() you need to use .eval() function to get the value of the Tensor:

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)) as newsess:
newsess.run(tf.global_variables_initializer())
mlflow.log_metric('Losses/%s' % loss.op.name, loss.eval())

Note: From TensorFlow 2.0 onwards, you can use .numpy() function instead of .eval().

mlflow.log_artifacts() logs all the files in a given directory as artifacts, taking an optional artifact_path. Artifacts can be any files like images, models, checkpoints, etc. MLflow has a mlflow.tensorflow module for things like logging the model. As most of the models have their own style of saving the models and checkpoints, we suggest to let the model save the model/checkpoint and let MLflow simply log the generated files using mlflow.log_artifacts(). This process is relatively error free.

If you would like to log the model yourself, you can use the following code:

# get the active mlflow run id
run_uuid = mlflow.active_run().info.run_uuid
# modify the default model log path to include a sub-directory named by the mlflow run id
export_path = FLAGS.train_logdir + '/' + run_uuid
print('Exporting trained model to', export_path)
#Create SavedModel object
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
#Save the model
builder.save(as_text=True)
#Log the saved model to the MLflow runs directory on the production server
mlflow.log_artifacts(export_path, "model")

Explanation: Most models are written in a way that they naively dump the checkpoints into a single log directory. For instance, Deeplab saves all its checkpoints into a single directory called train_logdir. As mlflow.log_artifacts() naively copies all the contents of this directory into the artifacts' directory, a run not only stores the artifacts for itself but also for all the previous runs. The workaround here is augmenting the model export path with the MLflow run id so that it saves the model associated with a particular run id in the directory and logs only that model in the production server.

In the next article, we will look into MLflow Projects.

--

--

Sumeet Gyanchandani
Analytics Vidhya

Associate Director at UBS | Former Machine Learning Engineer at Apple, Microsoft Research, Nomoko, Credit Suisse | Master of Science in Artificial Intelligence