Machine Learning Pipeline in Azure ML Studio

Gabriel Reversi
6 min readMay 4, 2023

--

Step-by-step tutorial on creating a sample machine learning pipeline in Azure ML Studio.

The purpose of this article is to show you how to build a sample pipeline for machine learning projects in Azure ML Studio. But before, you need to prepare your Azure environment to use correctly the ML Studio. In my last article, I teach step-by-step how to do it. You can check out here How to prepare your Azure environment to use Azure ML Studio.

Azure Machine Learning Studio (or Azure ML Studio, for short) is a cloud-based Machine Learning platform provided by Microsoft Azure. It enables users to create, train, evaluate, and deploy Machine Learning models at scale using a drag-and-drop graphical interface.

Azure ML Studio has a variety of features and tools that allow users to build and deploy Machine Learning pipelines, train Machine Learning models using popular algorithms, evaluate models using performance metrics, and deploy trained models in a variety of production environments.

Users can choose from a variety of pre-configured modules, such as data preprocessing modules, feature selection modules, model training modules, and model deployment modules. They can also use Python or R programming language to customize their modules and extend the functionality of Azure ML Studio.

After that soon summary about Azure ML, let’s go to get to work.

Creating Machine Learning Pipeline

The dataset that I’m using I got from Kaggle and it has data about customer churn but you can use others.
So, in this part, inside ML Studio, you will click on Pipeline and New Pipeline

Here you can see some models of pipelines all ready to use and with steps builds for each type of machine learning problem like classification, regression, and recommendation systems. For this case, we will choose a new pipeline.

After that, the create pipeline area will appear for you. Firstly click on the Data tab. You will see the datasets that you import from the container for Azure ML Studio. Just drag-drop it for the area like the image below.

Before of apply some model we need to prepare this data in a way that the model accepts. This step is quite common in machine learning projects and we will be doing it in pretty much every one of them.

The dataset that I’m using, has a column called ID Customer, which is not so important to our churn prediction. Let’s go to remove it.

Look for select columns in Dataset in the search bar and drag-drop the box for the pipeline area.

Do the connection between them like the image below. After that, you need to select the columns that you won't use for training the model. In my case, the column ID Customer.

Select all of the columns necessary and click on Save.

The next step is another big protagonist in the machine learning project. Missing data. To check with there are fields with missing data you can click on your first step with the right button and choose the option, Preview Data and Profile tab. Now you’ll see a lot of information about your columns and missing data is one of them.

This process is better to do in Python because is faster instead of clicking column by column to check out it. But here I will show you where you can see if the column has missing data. Just select the column and find the count of missing data.

As the dataset that I’m using has a lot of columns I used the following Python code df.isnull().sum(), to see fields with missing data. Some columns I’ll replace by mean value and others I’ll replace by zero, adding the step Clean Missing Data in Azure ML Studio.

Replace by Mean value
Replace by zero

The next step is transforming categorical data into numerical data. This step is necessary because the model that we’ll use doesn’t accept string data. If it was done in Python we would use the LabelEncoder or One-Hot-Encoder. Look for Convert to Indicator Values in the search bar and select the columns.

Now is the time to train the model with our data. To do this you need to use 3 steps. They are:

  • Split Data: This step will split the data. One part will be used for train and the other will be used for the evaluation of the model results.
  • Two-Class Logistic Regression: The target column that the model will predict is the binary type, It has values 0 and 1. Because of that, I used this step. But we can use the Multiclass Logistic Regression step in case of the datasets with more classes.
  • Train Model: Here you will select the column that you want to predict, in my case the column called Churn.

And finally, we got to the step of checking out the model result. Just add the Score Model and Evaluate Model step in the pipeline doing the connection with Train Model and Split Data, like the image below.

Executing the Pipeline

With everything right, you can run it and see if the pipeline succeeds or if there are any errors. Click on Submit, give a name for your experiment, and click on Submit again.

This may take a few minutes

If the execution doesn't show an error you will see all the steps green, in the Jobs tab on the sidebar.

Now to evaluate the model results trained, just click in the last step, Evaluate Model, Output + Log tab, and Preview Data.

Then Azure ML Studio shows you some metrics like Curve ROC, Precision-Recall, Lift Curve, F1-score, AUC, and Accuracy.

Conclusion

The goal of this article was to provide a basic introduction to the process of training Machine Learning models in Azure ML Studio, using a simple churn dataset, without going into too much detail about feature engineering, parameter optimization, and metric evaluation. It is possible that the generated model may not be fully optimized and accurate enough for use in real-world scenarios.

While Azure ML Studio is a powerful and user-friendly platform, it is important to highlight that the process of training Machine Learning models can be complex and involve several important steps to ensure the final model is accurate and robust. Feature engineering, parameter optimization, and metric evaluation are just a few of the crucial steps that should be considered to obtain a high-quality model.

If you got some error in some steps of the training model and don't know how to solve it, please let me know, maybe I can help you.

Thank you for reading.

--

--

Gabriel Reversi

Hi, I'm data analyst and data scientist. Here I share content about data, tools, methods and business. https://www.linkedin.com/in/gabrielreversi/