Linear Regression

Senih Berkay Akın
3 min readSep 2, 2022

--

When working with data, we will encounter two issues: regression and classification. In classification issues, we categorize our data into several groups (classes) since predicting the classes is the aim. There may be two or more classes or categories.

However, some issues call for a different strategy. For instance, consider determining the gross revenue of large corporations at the end of the year. Instead of attempting to classify in this instance, we use regression theory to attempt to quantify. This implies that we are interested in an infinite number of values in theory. You can see a classic linear regression plot graphic in the image below. The red line represents the linear regression line, and the blue points are the real data.

One or more observable features and the target variable are modelled in linear regression. We attempt to derive a function from the data we have that will provide us with the desired outcome. This function generates a line that we may use to search for property values and find predicted target values (see the red line above).

What we need to do is figure out how to mathematically explain the fundamental relationship between the features and the target. The link between features and the target should be conceptually represented by a line that can be drawn from the scatter chart of our known data, and at the same time, we should be able to accurately forecast values that are not directly observed in our data.

In order to get the best fit line for linear regression, we may simply apply the OLS (Ordinary Least Squares) Method without using any iterative techniques. However, this close-form solution becomes impractical if there are multiple features and additional regularization is required. To minimize the cost function in this instance, we employ the Gradient Descent Method.

Some Real Examples:

  • Stock Price Prediction
  • House/Car Price Prediction etc.

Keywords and definitions to understand Linear Regression:

  • Simple Linear Regression: a linear regression model with a single explanatory variable.
  • OLS(Ordinary Least Squares): a type of linear least squares method for estimating the unknown parameters in a linear regression model.
  • Residuals: The difference between the observed value and the estimated value of the quantity of interest (for example, a sample mean).
  • Cost Function: In statistics, typically a loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data.
  • Gradient Descent: a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point because this is the direction of the steepest descent.

--

--

Senih Berkay Akın

Computer Science & Engineering Student @sabanciu | Data Scientist @monsternotebook