MLflow Projects

Sumeet Gyanchandani
Analytics Vidhya
Published in
2 min readNov 6, 2019

This is the fourth article in my MLflow tutorial series:

  1. Setup MLflow in Production
  2. MLflow: Basic logging functions
  3. MLflow logging for TensorFlow
  4. MLflow Projects (you are here!)
  5. Retrieving the best model using Python API for MLflow
  6. Serving a model using MLflow

If you create a new project or clone an existing one you can make it an MLflow project by simply adding two YAML files, viz., MLproject File and Conda environment file, to the root directory of the project.

This step is not obligatory but is highly recommended, as it not only enhances the reproducibility of your models but also links the run to a specific version of the code (its git hash). This is very useful, as a user can simply git checkout to a particular git commit, if the future changes to code have affected it’s functionality and/or the results.

An example of an MLproject file for Deeplab on Cityscapes Semantic Segmentation Dataset:

name: deeplab
conda_env: conda.yaml
entry_points:
main:
parameters:
training_number_of_steps: {type: int, default: 900}
output_stride: {type: int, default: 16}
decoder_output_stride: {type: int, default: 4}
train_batch_size: {type: int, default: 1}
dataset: {default: 'cityscapes'}
train_logdir: {default: /home/sumeet/models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train}
dataset_dir: {default: /home/sumeet/models/research/deeplab/datasets/cityscapes/tfrecord}
command: "python train.py \
--logtostderr \
--training_number_of_steps={training_number_of_steps} \
--train_split='train' \
--model_variant='xception_65' \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride={output_stride} \
--decoder_output_stride={decoder_output_stride} \
--train_crop_size='769,769' \
--train_batch_size={train_batch_size} \
--dataset={dataset} \
--train_logdir={train_logdir} \
--dataset_dir={dataset_dir}"

An example of a Conda environment file:

name: production_env
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- ca-certificates=2019.8.28=0
- certifi=2019.9.11=py37_0
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=9.1.0=hdf63c60_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- ncurses=6.1=he6710b0_1
- openssl=1.1.1d=h7b6447c_2
- pip=19.2.3=py37_0
- python=3.7.4=h265db76_1
- readline=7.0=h7b6447c_5
- sqlite=3.30.0=h7b6447c_0
- tk=8.6.8=hbc83047_0
- xz=5.2.4=h14c3975_4
- zlib=1.2.11=h7b6447c_3
- pip:
- absl-py==0.8.1
- alembic==1.2.1
- astor==0.8.0
- attrs==19.2.0
- backcall==0.1.0
- bleach==3.1.0
- chardet==3.0.4
- cityscapesscripts==1.1.0
- click==7.0
- cloudpickle==1.2.2
- configparser==4.0.2
- cycler==0.10.0
- databricks-cli==0.9.0
- decorator==4.4.0
- defusedxml==0.6.0
- docker==4.1.0
- entrypoints==0.3
- flask==1.1.1
- gast==0.2.2
- gitdb2==2.0.6
- gitpython==3.0.3
- google-pasta==0.1.7
- gorilla==0.3.0
- grpcio==1.24.1
- gunicorn==19.9.0
- h5py==2.10.0
- idna==2.8
- imdbclassifier==0.6.6
- importlib-metadata==0.23
- ipykernel==5.1.2
- ipython==7.8.0
- ipython-genutils==0.2.0
- ipywidgets==7.5.1
- itsdangerous==1.1.0
- jedi==0.15.1
- jinja2==2.10.3
- joblib==0.14.0
- jsonschema==3.1.1
- jupyter==1.0.0
- jupyter-client==5.3.4
- jupyter-console==6.0.0
- jupyter-core==4.6.0
- keras==2.3.1
- keras-applications==1.0.8
- keras-preprocessing==1.1.0
- kiwisolver==1.1.0
- mako==1.1.0
- markdown==3.1.1
- markupsafe==1.1.1
- matplotlib==3.1.1
- mistune==0.8.4
- mlflow==1.3.0
- more-itertools==7.2.0
- nbconvert==5.6.0
- nbformat==4.4.0
- notebook==6.0.1
- numpy==1.17.2
- opt-einsum==3.1.0
- pandas==0.25.1
- pandocfilters==1.4.2
- parso==0.5.1
- pexpect==4.7.0
- pickleshare==0.7.5
- pillow==6.2.0
- prettytable==0.7.2
- prometheus-client==0.7.1
- prompt-toolkit==2.0.10
- protobuf==3.10.0
- ptyprocess==0.6.0
- pygments==2.4.2
- pyparsing==2.4.2
- pyrsistent==0.15.4
- python-dateutil==2.8.0
- python-editor==1.0.4
- pytz==2019.3
- pyyaml==5.1.2
- pyzmq==18.1.0
- qtconsole==4.5.5
- querystring-parser==1.2.4
- requests==2.22.0
- scikit-learn==0.21.3
- scipy==1.3.1
- send2trash==1.5.0
- setuptools==41.4.0
- simplejson==3.16.0
- six==1.12.0
- sklearn==0.0
- smmap2==2.0.5
- sqlalchemy==1.3.9
- sqlparse==0.3.0
- tabulate==0.8.5
- tensorboard==1.15.0
- tensorflow==1.15.0
- tensorflow-estimator==1.15.1
- tensorflow-gpu==1.15.0
- termcolor==1.1.0
- terminado==0.8.2
- testpath==0.4.2
- tornado==6.0.3
- traitlets==4.3.3
- urllib3==1.25.6
- wcwidth==0.1.7
- webencodings==0.5.1
- websocket-client==0.56.0
- werkzeug==0.16.0
- wheel==0.33.6
- widgetsnbextension==3.5.1
- wrapt==1.11.2
- zipp==0.6.0
prefix: ~/anaconda3/envs/production_env

You can manually create the above file or if you already have a stable Conda environment you can export it to a file by using the following command:

conda env export > conda.yaml

For more information refer to MLflow Projects.

In the next article, I will show you how to retrieve the best model from experiment in MLflow using Python API.

--

--

Sumeet Gyanchandani
Analytics Vidhya

Associate Director at UBS | Former Machine Learning Engineer at Apple, Microsoft Research, Nomoko, Credit Suisse | Master of Science in Artificial Intelligence