Machine Learning-Linear Regression

Published in

Analytics Vidhya

5 min readAug 27, 2020

“Data is a powerful entity and machine learning is the art of extracting useful information from the data set”

To craft such an art, Machine learning has various techniques/algorithms. A machine learning problem is defined based on three characteristics — Performance, Task, and Experience. The performance of a task improves based on previous data or experience.

Machine learning is classified into three major parts:

1) Supervised Learning

When a given data set has a predefined set of labeled inputs/outputs. Then it is easier to train a model to find relationships between various entities. For example — House price prediction, Email spam classification.

2) Unsupervised Learning

When a given data set doesn’t have a predefined set of labeled inputs/outputs. Then we can train a model to group the data based on characteristics and similarities. Example — Anomaly/Cancer prediction, fraudulent transaction.

3) Reinforcement Learning

When the necessary data is not given, a series of experiments are performed and the data is collected. The collected data must represent the entire community so as to get the accurate results. Example — Playing computer games to improve accuracy through award and penalty system.

Data Pre-Processing:

Data preprocessing in Machine Learning refers to the technique of preparing (cleaning and organizing) the raw data to make it suitable for a building and training Machine Learning models.

Collecting the huge amount of relevant data
Replacing null with median values and maintaining common data type each column-wise.
Dropping unnecessary features
Representing characters/string in the form of numbers using LabelEncoder
Data normalization: This technique is required when the data representing has different scales of value. For eg. Human recognizable units such as personage = 25, speed of a car = 80 Km/hr, etc.. which a machine would easily get confused. So data normalization helps to solve this problem. The formulae used here is as shown below

The scale of the data after pre-processing will be in the range of (-1 to +1)

Separating input and output from raw data

Frameworks:

Pandas — Manipulating raw data.
Numpy — Mathematical calculations.
Matplotlib, pyplot — Plotting graphs.

Mathematics:

Why matrix: A list is represented as a vector, multiple vectors represent a matrix. In this form, it is easier to do calculations especially when we have huge data.

Why differentiation: To find out the small differences while building our model. The derivative is for single variable functions, and the partial derivative is for multivariate functions. In calculating the partial derivative, we will just change the value of one variable, while keeping others constant.

Supervised Learning can be classified into two types:

Regression — Continous range of values(Output)
Classification — Output is discrete and predicts to be in either one of the groups.

All the supervised learning models have data and it should be divided into two parts: Input(X) and Output(Y). In order to understand/find a suitable model for the data use a scatter plot.

Regression Model:

Looking at the below data points, we tend to define such data by a line.

Equation of a line

In order to predict accurate values, we need a best-fit line representing the whole data set. Rearranging the variables as commonly used in the machine learning context.

x — input parameter
Θ0— Bias: Meaning taking sides. In ML, we choose one generalization over another from the set of possible generalizations
Θ1— hyperparameter: Tuned for a given predictive modeling problem
y — output

Error Function/Cost:

Every ML model's objective is to reduce the error to 0.

Minimize the error

To Calculate the sum of all the errors, we use a squared error function.

Model: X*Θ^T

Minimize Error: We can minimize the error by trying out different Θ1 values in our model.

Gradient Descent:

The graph shown here is — for multiple theta values and the respective squared error value.

No matter how high our error value is, we need to bring to a minimum. In order to do that, we can utilize the slope of a line(∂y/∂x)[Partial differentiation of y with respect to x] which is shown below. A negative gradient ensures we are going down the curve and ‘C’ is the learning rate by which we progress the value of theta.

Choosing a ‘learning rate’ is important for a model. In order to do that we need to practice by trial and error method. To automate this we have gradient descent formula which is universal to all the machine learning models.