Full ML pipeline: From data collection to deploying using Prevision.io

Zeineb Ghrib
Prevision.io
7 min readOct 20, 2020

--

When I integrated Prevsion.io, I realized how much it is easy to resolve machine learning use cases within reduced delays. Thanks to the auto-ml platform offered by Prevision all we have to do is to discuss with the client about business insights and potential features/additional data to add, and we leave the platform doing all the machine learning process.

However, the remaining question is how to make our clients consume the models with satisfying performances for them. Fortunately, Prevision offers, in addition to the auto-ml studio, a cloud platform service, called Store, allowing to deploy your models or web apps that can interact with your pre-trained models already hosted on the Studio. The store of your instance is reachable at: https://xxx.prevision.io/store (xxx your instance name)

The store takes care of the Dev-Ops aspect , so you can focus on how to your improve your front-end apps.
In this post we will go through all the steps to deploy a dash web app on the store. We will use the IBM Watson Marketing Customer Value Data gathered from Watson Analytics, to create a dash web app simulator allowing to change feature values and get an updated score at each simulation.

PS : Prevision Store supports different ways to deploy a model: here an explicit explanations of the different ways of to deploy your models using the store

First of all we will create a new repo in github or in gitlab (here we will be using gitlab but it is the same from github). All the source code is available here.

I created a new repo as follows, then I cloned it:

Pre-requisites

Before all, we have install some pre-requisites allowing us interact with our instance. Prevision.io provides the possibility of creating, training and predicting via a python code, by using the sdk package. So we need to download the package as shown below:

git clone https://github.com/previsionio/prevision-python
cd prevision-python
python setup.py install

Then, we have to fix some environment variables; In fact to interact with your prevision.io instance via the sdk, you have to connect to your instance with the corresponding URL (https://xxx.prevision.io, with xxx : the name of your instance) and a valid MASTER_TOKEN (here is explained how to get your token).

One simple way to set an environment variable the is to use export cmd :

$export YOUR_VARIABLE=VALUE

then use os.environ.get(‘YOUR_VARIABLE’) inside the script to get your variable.
However, I would recommend a better way to manage your environment variables: the dotenv python package:

  1. Store your config variables in a special file
    Create a .env file in the project root folder. and add it to you .gitignore. Then you add your environment variables as follows
URL=https://XXX.prevision.io` 
TOKEN=YOUR_MASTER_TOKEN

2. Use dotenvpackage to load these variables automatically, then we can connect to our instance

(Here a really good post showing some best practices for Machine Learning projects.)

Now we can start the project ^_^

Launch a new use case:

We will launch a Prevision auto ml use-case using the sdk (here more extensive examples of how to use the sdk)

Inside the /src folder, I created a python file called pio_training.py , in this script we will implement all the required steps to launch a new use case on Prevision.

  1. Download the raw dataset if it does not exist locally

2. Create and register train/test subsets in order to launch our usecase .

PS : within the source code I had added a function allowing to wait until the platform finishes to make all the required computations around a dataset once it is created (for further information about the platform background dataset processing consult this page)

3. Set the use case config and Launch a new classification use-case:

Through the snippet code above, we told to the platform the following information :

  • The target column is called ‘Response’
  • It exists an Id column called ‘Customer’
  • For this use case we want to test out only random forest and linear models algorithms
  • Use all feature engineering ops offered by the platform except pio.Feature.PCA , pio.Feature.kMeans and pio.Feature.PolynomialFeatures
  • Use the AUC metric for training
  • Don’t train any simple model
  • The training dataset is train_fe stored in the previous step
  • The holdout dataset is test_fe

Create your dash app

The dash code can be found here

Now that our models are trained, we will take care of implementing our dash app. We can schematize our app as follows:

Our Dash uses Flask server under the hood, and will be deployed on our Store (a PaaS service provided by Prevision) that supports Flask apps. Once deployed it would interact with our pre-trained model, which is on the other hand hosted on the Studio, via API calls packaged on the python sdk.

Now we have to design and implement the manner of how the dynamic elements will be displayed on our web browser.
We opted to create the app as follows (it can be improved :p ):

Notice that many utilities are provided by the sdk and facilitate the app implementation; we had used the following properties/methods:

  • uc.schema property containing features specifications : name, type, options(list of categories for the categorical features), min, max for numerical features.
  • best.feature_importance : a property containing data frame of feature importance for the given model (sorted by descending scores)
  • best.get_dynamic_performances() : a method allowing to get the model performance depending on a decision threshold (we had settled it to the optimal threshold that corresponds to the maximum value of the f1-score)

Now all we have to do is to is to combine the HTML created elements within the app principle layout:

The className elements refers to CSS classes that are defined in this gist.
The CSS link must be settled in external_stylesheets parameter so that the dash app knows that it will include the css from an external resource.

You notice that I added a new html division with id=prediction_result. Now we need to make the different HTML components interact with each other. Dash makes it possible via the app.callback functions. We will implement a callback function that changes dynamically the HTML element identified by prediction_result every time the value of the other elements change, without having to reloading the page:

It remains one thing, you have to create within the project root folder a python file that MUST be named run.py from which you will import your app server from your dash app file:

from src.app_dash import app

And that’s it!!!
The overall project structure is as follows:

Now before we deploy our app into the store, we need to make sure that it would be running correctly in a prof environment:
For that we use gunicorn instead of Flask server. Because Flask was not designed to be running on a production environment (official doc).

Install gunicorn package and run the following command:

$ gunicorn --bind 0.0.0.0:4000 --threads 10 -w 2 --timeout 120 --limit-request-line 0 --access-logfile - run:app

If it is all Ok then commit and push into your gitlab/github (you can clone mine)

Deploy on the Store:

Prevision Store allows to deploy your web app directly from your gitlab/github repository. Two conditions must be satisfied to succeed the deployment final step:

  • The main program must be named run.py and imports the dash server that must be named app
  • A requirements.txt file containing all the required packages must be present in your repo: you can use the command to the required file:
$ pip freeze > requirements.txt

Now go to the store front end :

  • Go to Applications (horizontal tab)
  • Select Web-application (vertical tab)

Then set up the application-name, select the application language (python in our case). In File Selection section select gitlab (or github), then select the related repository and the corresponding branch. Then click on deploy the web-application button

You wait a little until the Deployed status is in green:

Then when you click on your app you will be redirected on a page that looks like this:

And Finally click on your Application url, you get your app :

Here the link to the app (I made it public)

Conclusion:

Thanks for reading my post ^_^, I am planning to write another post about how to implement the exact same application, but by using this time open source solutions : scikit-learn for modeling and heroku PaaS for the deployment. And you will notice the difficulty difference.

All the source code is available here, and note that you can test out my code by logging in our public instance : https://cloud.prevision.io/

If you have any question or an issue while running the app don’t hesitate to report it to me.
My Email : zeineb.ghrib@prevision.io

--

--