End to End Machine Learning Pipeline With MLOps Tools (MLFlow+DVC+Flask+Heroku+EvidentlyAI+Github Actions)

  • Cookiecutter: Data science project structure
  • Data version control (DVC): Version control of the data assets and to make pipeline
  • Github: For code version control
  • GitHub Actions: To create the CI-CD pipeline
  • MLFlow: For model registry
  • Heroku: To deploy the application
  • Flask: To create a web app
  • EvidentlyAI: To evaluate and monitor ML models in production
  • Pytest: To implement the unit tests
conda create -n churn_model python=3.7 -y 
conda activate churn_model
  1. project_name: churn_model
  2. repo_name:churn_model
  3. author_name:shanaka_chathuranga
  4. description: End to End Machine learning pipeline with MLOps tools
  5. Select open_source_license: select MIT(option 1)
  6. s3_bucket /aws_profile[Optional]: just press enter
  7. Select python_interpreter:python3 ( Option 1)
pip install cookiecutter
cookiecutter
https://github.com/drivendata/cookiecutter-data-science
cd churn_model
Project folder structure
git init 
git add .
git commit -m "Adding cookiecutter template"
git remote add origin <your_github_repo>
git branch -M main
git push -u origin main
pip install dvc 
dvc init
dvc add train.csv
  • params.yaml
    This will store all the configurations related to this project.
  • load_data.py
    External train.csv file will be loaded into the raw folder in this script. Only 6 numerical features were used in this model for simplicity. The new CSV which is in the raw folder contains the six numerical features and the target column churn. The main focus of this project is to give more details about the MLOps tools. Therefore very little effort was done for the modeling part.
  • split_data.py
    The objective of this python script is to split the train.csv in a raw folder and create new churn_train and churn_test inside the processed folder.
  • train_model.py
    Model training will be done using this script. Model-related information is available in params.yaml file. We can experiment with several ML models by changing and adding the parameters in params.yaml file. The mlflow will be used to track the model performances. We can easily check the model performance using the mlflow dashboard.
  • production_model_selection.py
    This script will support you to select the best-performing model from the model registry and save the best model in the model directory. The best model was selected using an accuracy score in this model.
* mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./artifacts --host 0.0.0.0 -p 1234
* dvc repro
pytest -v 
Finally, create the new file namely procfile. This file will act as an entry point to start the web app.
* web gunicorn app:app
Input data drift (X)
Target Drift (Y)
Target behavior by feature.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store