Linear Regression: The Easier Way
Linear Regression is the great entry path in the amazing world of Machine Learning! and the most simplest algorithm to learn .
I will focus more on the code then theory because code is more fun :) but just to fire up the resting linear regression neurons in your brain.
Linear Regression : It is method to find pattern with a “Best Fit Line” ( therefore, “Linear” get it ?) in your data.
Looks something like this (for any x or y axis)
In this example we will be using the “ load_boston” dataset i.e. the boston housing pricing dataset from scikit learn.
Full Code :)
Code insight:
The above code is divided into 5 steps..
- Load and preety print the dataset.
from sklearn.datasets import load_boston
load = load_boston()
First line above imports the load_boston dataset from the datasets collection which is present in the sklearn library.
Second line just puts the load_boston in the load variable for further use. Simple enough…
import pprint
pprint.pprint(boston)
Importing the preetprint library, its used to make our output look preety and understandable and then we are just preety printing it.
Note :- There is no need to use the last two line of preety print when we are predicting. Since you don’t need to print the data file again, this preety print is only for your understanding purpose of the dataset file.
2. Create the DataFrame .
In this we will create two DataFrame using the pandas library. Dataframe is just a fancy word for making the sets or arrays of data. The df_x contains the data or features of the houses with the columns = boston. features_names which is also an array present in the dataset for more clear understanding and the df_y contains the target prices respectively.
3. Selecting the Model (i.e. LinearRegression)
Simple as it can be, we are importing the LinearRegression() model from the sklearn library and giving the model to the variable model for further use in the code.
4. Splitting Test and Train Datasets with randomness.
Here, we are using the train_test_split from the sklearn library. This function in above code requires four parameters the data DataFrame, target DataFrame, test_size, random_state (Just to shuffle the data).
Note: In the above line train_size is automatically given as 0.8 since test_size =0.2 therefore, test_size + train_size = 0.2+0.8 = 1
5. Train and Predict .
In the above code, fit (aka TRAINING )function takes two parameters x_train, y_train and train the model with the given datasets.
Next, we will predict the y_test from the x_test dataset using the selected model (i.e. LinearRegression() ) and put the pedeicted array of values in the result variable. and then we are just printing the fifth predicted value (i.e. result[5]) and the full y_test dataset. You should the 5 from something else just remember the counting starts from 0 not 1 in the list.
The Results
As you can see, the first [ 25.43573299 ] is the prediction for the fifth element in the array i.e. 70 — — 24.2 , pretty close. If you are wondering what is this first line of random numbers uhh… . It is the serial number against the values of the y_test dataset which has been selected randomly ( in the 4th step) .
Just a gif of Linear Regression to understand it.