Linear Regression: Understanding the Linear Relationship

Adeyemi Testimony
Testimony’s Data Journey
2 min readNov 19, 2023

Linear regression is a statistical technique that models the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables, meaning that the dependent variable changes at a constant rate as the independent variable changes. In plain English, it assumes that two things are connected either that they go up together or in the opposite direction.

The most basic one is the height and age of children, as children grow older they grow taller. So we know that age and height are correlated.

Real-life Example of Linear Regression

Consider the relationship between the amount of fertilizer used on a crop and its yield. A farmer might collect data on the amount of fertilizer used on different plots of land and the yield of each plot. They could then use linear regression to model the relationship between fertilizer use and yield. The model would produce an equation that could be used to predict the yield of a plot of land based on the amount of fertilizer used.

Terms Used in Linear Regression

  • Dependent variable: The variable that is being predicted. In the example above, the yield of the crop is the dependent variable.
  • Independent variable: The variable that is used to predict the dependent variable. In the example above, the amount of fertilizer used is the independent variable.
  • Coefficient of determination (R-squared): A measure of how well the model fits the data. R-squared values range from 0 to 1, with higher values indicating a better fit.
  • Intercept: The value of the dependent variable when the independent variable is 0.
  • Slope: The rate of change of the dependent variable as the independent variable changes.

Mathematical Explanation/Foundations of Linear Regression

Linear regression is based on the following equation:

y = mx + b

where:

  • y is the dependent variable
  • x is the independent variable
  • m is the slope
  • b is the intercept

To find the values of m and b, we use the least squares method. This method minimizes the sum of the squared differences between the predicted values y and the actual values of y.

When Should You Use Linear Regression?

Linear regression is a good choice when:

  • There is a linear relationship between the dependent variable and the independent variable.
  • The data is clean and there are no outliers.
  • You want to make predictions about the dependent variable based on the independent variable.

When Not to Use Linear Regression

Linear regression is not a good choice when:

  • There is not a linear relationship between the dependent variable and the independent variable.
  • The data is noisy or there are outliers.
  • You want to understand the complex relationship between multiple variables.

In these cases, other machine learning techniques, such as non-linear regression or tree-based methods, may be more appropriate.

--

--