This is the fourth article in my MLflow tutorial series:

  1. Setup MLflow in Production
  2. MLflow: Basic logging functions
  3. MLflow logging for TensorFlow
  4. MLflow Projects (you are here!)
  5. Retrieving the best model using Python API for MLflow
  6. Serving a model using MLflow

If you create a new project or clone an existing one you can make it an MLflow project by simply adding two YAML files, viz., MLproject File and Conda environment file, to the root directory of the project.

This step is not obligatory but is highly recommended, as it not only enhances the reproducibility of your models but also links the run to a specific version of the code (its git hash). This is very useful, as a user can simply git checkout to a particular git commit, if the future changes to code have affected it’s functionality and/or the results.

An example of an MLproject file for Deeplab on Cityscapes Semantic Segmentation Dataset:

name: deeplab
conda_env: conda.yaml
training_number_of_steps: {type: int, default: 900}
output_stride: {type: int, default: 16}
decoder_output_stride: {type: int, default: 4}
train_batch_size: {type: int, default: 1}
dataset: {default: 'cityscapes'}
train_logdir: {default: /home/sumeet/models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train}
dataset_dir: {default: /home/sumeet/models/research/deeplab/datasets/cityscapes/tfrecord}
command: "python \
--logtostderr \
--training_number_of_steps={training_number_of_steps} \
--train_split='train' \
--model_variant='xception_65' \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride={output_stride} \
--decoder_output_stride={decoder_output_stride} \
--train_crop_size='769,769' \
--train_batch_size={train_batch_size} \
--dataset={dataset} \
--train_logdir={train_logdir} \

An example of a Conda environment file:

name: production_env
- defaults
- _libgcc_mutex=0.1=main
- ca-certificates=2019.8.28=0
- certifi=2019.9.11=py37_0
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=9.1.0=hdf63c60_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- ncurses=6.1=he6710b0_1
- openssl=1.1.1d=h7b6447c_2
- pip=19.2.3=py37_0
- python=3.7.4=h265db76_1
- readline=7.0=h7b6447c_5
- sqlite=3.30.0=h7b6447c_0
- tk=8.6.8=hbc83047_0
- xz=5.2.4=h14c3975_4
- zlib=1.2.11=h7b6447c_3
- pip:
- absl-py==0.8.1
- alembic==1.2.1
- astor==0.8.0
- attrs==19.2.0
- backcall==0.1.0
- bleach==3.1.0
- chardet==3.0.4
- cityscapesscripts==1.1.0
- click==7.0
- cloudpickle==1.2.2
- configparser==4.0.2
- cycler==0.10.0
- databricks-cli==0.9.0
- decorator==4.4.0
- defusedxml==0.6.0
- docker==4.1.0
- entrypoints==0.3
- flask==1.1.1
- gast==0.2.2
- gitdb2==2.0.6
- gitpython==3.0.3
- google-pasta==0.1.7
- gorilla==0.3.0
- grpcio==1.24.1
- gunicorn==19.9.0
- h5py==2.10.0
- idna==2.8
- imdbclassifier==0.6.6
- importlib-metadata==0.23
- ipykernel==5.1.2
- ipython==7.8.0
- ipython-genutils==0.2.0
- ipywidgets==7.5.1
- itsdangerous==1.1.0
- jedi==0.15.1
- jinja2==2.10.3
- joblib==0.14.0
- jsonschema==3.1.1
- jupyter==1.0.0
- jupyter-client==5.3.4
- jupyter-console==6.0.0
- jupyter-core==4.6.0
- keras==2.3.1
- keras-applications==1.0.8
- keras-preprocessing==1.1.0
- kiwisolver==1.1.0
- mako==1.1.0
- markdown==3.1.1
- markupsafe==1.1.1
- matplotlib==3.1.1
- mistune==0.8.4
- mlflow==1.3.0
- more-itertools==7.2.0
- nbconvert==5.6.0
- nbformat==4.4.0
- notebook==6.0.1
- numpy==1.17.2
- opt-einsum==3.1.0
- pandas==0.25.1
- pandocfilters==1.4.2
- parso==0.5.1
- pexpect==4.7.0
- pickleshare==0.7.5
- pillow==6.2.0
- prettytable==0.7.2
- prometheus-client==0.7.1
- prompt-toolkit==2.0.10
- protobuf==3.10.0
- ptyprocess==0.6.0
- pygments==2.4.2
- pyparsing==2.4.2
- pyrsistent==0.15.4
- python-dateutil==2.8.0
- python-editor==1.0.4
- pytz==2019.3
- pyyaml==5.1.2
- pyzmq==18.1.0
- qtconsole==4.5.5
- querystring-parser==1.2.4
- requests==2.22.0
- scikit-learn==0.21.3
- scipy==1.3.1
- send2trash==1.5.0
- setuptools==41.4.0
- simplejson==3.16.0
- six==1.12.0
- sklearn==0.0
- smmap2==2.0.5
- sqlalchemy==1.3.9
- sqlparse==0.3.0
- tabulate==0.8.5
- tensorboard==1.15.0
- tensorflow==1.15.0
- tensorflow-estimator==1.15.1
- tensorflow-gpu==1.15.0
- termcolor==1.1.0
- terminado==0.8.2
- testpath==0.4.2
- tornado==6.0.3
- traitlets==4.3.3
- urllib3==1.25.6
- wcwidth==0.1.7
- webencodings==0.5.1
- websocket-client==0.56.0
- werkzeug==0.16.0
- wheel==0.33.6
- widgetsnbextension==3.5.1
- wrapt==1.11.2
- zipp==0.6.0
prefix: ~/anaconda3/envs/production_env

You can manually create the above file or if you already have a stable Conda environment you can export it to a file by using the following command:

conda env export > conda.yaml

For more information refer to MLflow Projects.

In the next article, I will show you how to retrieve the best model from experiment in MLflow using Python API.



