Analytics Vidhya
Published in

Analytics Vidhya

Introduction to Linear Regression and Azure ML Studio

Hi! I’ve been interested in machine learning concepts since last week. Actually, I was interested before that time but since then I’ve been trying to deep dive into machine learning because I’ve been working as a data analyst for 11 months however data had been part of my job so I was already involved in data stuff. I think I made a good reading list about this topic. In this story, we will try to understand what linear regression is and what we can do by using it. After that, we will take a look at Azure ML Studio and create our first linear regression model on these datasets.

Linear Regression

As we know there are 3 types of Machine Learning. Regression comes from the Supervised Learning type of Machine Learning. This type of Machine Learning needs to use labelled data. We split the dataset into two parts: train and test data. We train the model with a train dataset and we evaluate it with a test dataset.

In this story I won’t explain what other types of Machine Learning are and what Supervised Learning is etc.

We use regression models to predict quantities of the dependent variables by using the independent variables.

Azure Machine Learning Studio

Let’s start our first machine learning model creating trip. For this, I will use Azure ML Studio. It is a free service until a point. Actually, I don’t have much knowledge of this platform but I can explain the steps of creating a simple model.

Import Data

I’m going to download datasets from here. You can find 2 datasets. I will download both of them but in ML Studio I will concatenate them as a whole dataset.

If you don’t know what Kaggle is, you should check this story. I will provide you with simple information, you can download datasets for free from Kaggle.

When you sign in to ML Studio you will be on the Experiments page.

Probably you will get the blank page if you didn’t create an experiment previously. And I am going to the Datasets page and click the ‘New’ button down on the left side. Now we can import our datasets which we have already downloaded from Kaggle. We need to follow this path: Dataset -> From Local File -> Select datasets.

If we have already imported the datasets successfully, we can start to create the experiment.

Create an Experiment

Again I am going to the ‘Experiments’ page and click the ‘New’ button.

We can see some already built examples provided by Azure ML Studio. I am going to continue by clicking on the ‘Blank Experiment’ option.

We can create models and workflows by using components that are located on the left sidebar.

Let’s Start to Create a Simple Model

Firstly we need to import datasets that we will use. For this case, I’ll use the ‘Saved Datasets’ component. You can search for the components by using the search bar from the left sidebar. I am going to drag it to the middle and drop it.

I dropped the 2 datasets. Actually, we can use them separately but I want to use them together. Firstly I’ll concatenate them after that I’ll separate them again as 2 datasets; train and test.

Concatenate 2 Datasets

I’ll use the ‘Add Rows’ component for this case. And I’ll drag them from the bottom point to input points of the ‘Add Rows’ component. I did so because they had the same columns. Actually, we could use them without concatenating but I wanted to show how we can concatenate 2 datasets. We’ll separate it again into 2 datasets as, test and train datasets.

If you hover on the bottom point of the Add Rows component after clicking the ‘Run’ button you can see the ‘Visualize’ option. If you use it, you will see 1000 rows. If you did the same thing for 2 datasets, you’ll see 300 and 700 rows respectively.

Split Dataset

Now we need to split the dataset into 2 parts, train, test data. We will train the model with the train dataset and we will evaluate the model’s score with the test dataset. I’ll use the ‘Split Data’ component for splitting the dataset. I drag and drop the component under the ‘Add Rows’ component and I drag the point at the bottom of the ‘Add Rows’ to the ‘Split Data’ component.

We can set a percentage that how much percentage will be training data from the right side. In this case, I set %75 (0.75) of the main dataset for trainig data and the test data will be %25. ML Studio divided it into 2 datasets.

There are 2 output points, 1st and 2nd. We use 1st for the train dataset and 2nd for the test dataset.

Training Model

We split our dataset successfully. Now we need to train a model. I’m going to use the ‘Train Model’ and ‘Linear Regression’ components.

I’ll combine the ‘Train Model’ component’s 2nd input point with the 1st output point of the ‘Split Data’ component. After that, I’ll combine the ‘Train Model’ component’s 1st input with the output point of the ‘Linear Regression’ component. And I’ll choose the ‘y’ column from the ‘Train Model’. For our case, we will try to predict the ‘y’ by using the ‘x’.

Scoring the Model

We have already created our model. Now we need to score (test) our model. I am going to use the ‘Score Model’ component. As you probably remember we have the test dataset which we’ll use now. I’ll combine the ‘Train Model’ component’s output with the 1st input point of the ‘Score Model’ component, and the ‘Score Model’ component’s 2nd input point with the 2nd output with the ‘Split Data’ component.

We can see predicted values under the ‘Scored Label’ column.

Evaluating the Model

Now we can evaluate the model. For this, I’ll use the ‘Evaluate Model’ component. I’ll combine the output point of the ‘Score Model’ component with 1st input point of the ‘Evaluate Model’ component.

If we hover on the output point of the ‘Evaluate Model’ component, we can see the ‘Visualize’ button.

Final Thoughts

I hope this story was helpful and you enjoyed it. As I said, my machine learning journey is not long. I’m playing with these programmes like ML Studio for understanding the core of machine learning. I wanted to introduce Azure ML Studio to you.

Kind regards.

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Monkey Classification using CNN

Optimization techniques in Deep learning

Bert For Topic Modeling ( Bert vs LDA )

A 2021 Guide to improving CNNs-Training strategies: Training Methodology & Regularization

Quick steps to do Data Augmentation for your model

Not Another RL Tutorial!

Predicting claims Severity: A Machine Learning Approach

Codeq NLP API Tutorial

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Baysan

Baysan

Lifelong learner & Freelancer. I use technology that helps me. I’m currently working as a Business Intelligence Developer. github.com/mebaysan

More from Medium

Linear regression using Spark

A Data Quality Test Approach with Python to Identify Non-Standard Character Patterns

Predict Customer Churn with Pyspark

End to End PySpark Clustering: Part I Using Colab for PySpark and Collecting Data