Introduction to Linear Regression and Azure ML Studio
Hi! I’ve been interested in machine learning concepts since last week. Actually, I was interested before that time but since then I’ve been trying to deep dive into machine learning because I’ve been working as a data analyst for 11 months however data had been part of my job so I was already involved in data stuff. I think I made a good reading list about this topic. In this story, we will try to understand what linear regression is and what we can do by using it. After that, we will take a look at Azure ML Studio and create our first linear regression model on these datasets.
As we know there are 3 types of Machine Learning. Regression comes from the Supervised Learning type of Machine Learning. This type of Machine Learning needs to use labelled data. We split the dataset into two parts: train and test data. We train the model with a train dataset and we evaluate it with a test dataset.
In this story I won’t explain what other types of Machine Learning are and what Supervised Learning is etc.
We use regression models to predict quantities of the dependent variables by using the independent variables.
Azure Machine Learning Studio
Let’s start our first machine learning model creating trip. For this, I will use Azure ML Studio. It is a free service until a point. Actually, I don’t have much knowledge of this platform but I can explain the steps of creating a simple model.
I’m going to download datasets from here. You can find 2 datasets. I will download both of them but in ML Studio I will concatenate them as a whole dataset.
If you don’t know what Kaggle is, you should check this story. I will provide you with simple information, you can download datasets for free from Kaggle.
When you sign in to ML Studio you will be on the Experiments page.
Probably you will get the blank page if you didn’t create an experiment previously. And I am going to the Datasets page and click the ‘New’ button down on the left side. Now we can import our datasets which we have already downloaded from Kaggle. We need to follow this path: Dataset -> From Local File -> Select datasets.
If we have already imported the datasets successfully, we can start to create the experiment.
Create an Experiment
Again I am going to the ‘Experiments’ page and click the ‘New’ button.
We can see some already built examples provided by Azure ML Studio. I am going to continue by clicking on the ‘Blank Experiment’ option.
We can create models and workflows by using components that are located on the left sidebar.
Let’s Start to Create a Simple Model
Firstly we need to import datasets that we will use. For this case, I’ll use the ‘Saved Datasets’ component. You can search for the components by using the search bar from the left sidebar. I am going to drag it to the middle and drop it.
I dropped the 2 datasets. Actually, we can use them separately but I want to use them together. Firstly I’ll concatenate them after that I’ll separate them again as 2 datasets; train and test.
Concatenate 2 Datasets
I’ll use the ‘Add Rows’ component for this case. And I’ll drag them from the bottom point to input points of the ‘Add Rows’ component. I did so because they had the same columns. Actually, we could use them without concatenating but I wanted to show how we can concatenate 2 datasets. We’ll separate it again into 2 datasets as, test and train datasets.
If you hover on the bottom point of the Add Rows component after clicking the ‘Run’ button you can see the ‘Visualize’ option. If you use it, you will see 1000 rows. If you did the same thing for 2 datasets, you’ll see 300 and 700 rows respectively.
Now we need to split the dataset into 2 parts, train, test data. We will train the model with the train dataset and we will evaluate the model’s score with the test dataset. I’ll use the ‘Split Data’ component for splitting the dataset. I drag and drop the component under the ‘Add Rows’ component and I drag the point at the bottom of the ‘Add Rows’ to the ‘Split Data’ component.
We can set a percentage that how much percentage will be training data from the right side. In this case, I set %75 (0.75) of the main dataset for trainig data and the test data will be %25. ML Studio divided it into 2 datasets.
There are 2 output points, 1st and 2nd. We use 1st for the train dataset and 2nd for the test dataset.
We split our dataset successfully. Now we need to train a model. I’m going to use the ‘Train Model’ and ‘Linear Regression’ components.
I’ll combine the ‘Train Model’ component’s 2nd input point with the 1st output point of the ‘Split Data’ component. After that, I’ll combine the ‘Train Model’ component’s 1st input with the output point of the ‘Linear Regression’ component. And I’ll choose the ‘y’ column from the ‘Train Model’. For our case, we will try to predict the ‘y’ by using the ‘x’.
Scoring the Model
We have already created our model. Now we need to score (test) our model. I am going to use the ‘Score Model’ component. As you probably remember we have the test dataset which we’ll use now. I’ll combine the ‘Train Model’ component’s output with the 1st input point of the ‘Score Model’ component, and the ‘Score Model’ component’s 2nd input point with the 2nd output with the ‘Split Data’ component.
We can see predicted values under the ‘Scored Label’ column.
Evaluating the Model
Now we can evaluate the model. For this, I’ll use the ‘Evaluate Model’ component. I’ll combine the output point of the ‘Score Model’ component with 1st input point of the ‘Evaluate Model’ component.
If we hover on the output point of the ‘Evaluate Model’ component, we can see the ‘Visualize’ button.
I hope this story was helpful and you enjoyed it. As I said, my machine learning journey is not long. I’m playing with these programmes like ML Studio for understanding the core of machine learning. I wanted to introduce Azure ML Studio to you.