Finding Weights in Regression Models

Can Benli
5 min readSep 30, 2023

--

The main goal in the process of estimating parameters is to determine the constants and weights that will provide the smallest error. In this process, the Cost Function is used, similar to the Mean Square Error (MSE) expression. Our aim is to analyze the error values calculated for certain constant and weight combinations and to choose these constant and weight values in a way that provides the smallest value for the Cost function.

For this purpose, techniques such as Normal Equations Method and Gradient Descent Method are used to find the optimum weights and constant value. These methods allow parameters in regression models to be determined accurately so that the model can be best fitted to the data.

Method of Normal Equations (Least Squares Method): As an analytical solution, it is based on differentiation and is expressed in matrix form. This method is used when there is only one dependent variable and one independent variable in a model, partial derivatives are taken for these variables separately and set equal to zero. In this way, each parameter in the model, for example b₀ and b₁, is obtained. If a multivariate model is being studied, partial derivatives related to the parameters are calculated in a similar way. As a result of these partial derivatives, a matrix is formed. Solving this matrix provides the predicted ß values.

The Normal Equations Method involves inverting the matrix in the analysis of multivariate models. However, this process can become complex as the number of observations and variables increases. Therefore, especially in large data sets or high-dimensional data analysis, the use of optimization-based approaches such as the Gradient Descent Method becomes important. The Gradient Descent Method can work faster and more efficiently on large data sets and multidimensional models because it tries to reach the minimum by updating step by step. Therefore, it is important to choose the right method to effectively estimate model parameters.

Gradient Descent Method (Optimization Solution): It is a frequently used method in the fields of machine learning and optimization. Its main purpose is to find appropriate parameter values to minimize or maximize a function. This method can be used for any differentiable function and aims to optimize the minimum or maximum points of this function.

The Gradient Descent method starts by taking partial derivatives of the function according to the relevant parameter. These partial derivatives form the gradient value and specify the direction of maximum increase of the function. The method updates the parameter values at each iteration by moving in the negative direction of the gradient. These updates aim to converge towards the minimum value of the function.

In other words, the Gradient Descent Method aims to obtain the minimum or maximum value of the relevant function by interactively updating the parameter values by moving in the negative direction of the gradient, called the “steepest descent” direction. In this context, it is used as an effective tool to minimize the Cost function and find optimum parameter values by using it in regression models or other optimization problems.

In gradient descent, this cost function can be expressed in a different notation:

Here hₒ(xᶦ) represents the predicted values and yᶦ values represent the actual values. Error is measured by squaring the difference between actual values and predicted values. This is a variant of MSE (Mean Square Error). The m value, which is the number of observations, is multiplied by 2 to eliminate the 2 in the square operation we use in partial derivatives.

A graph showing how the cost function changes according to changing weight values is examined. The Gradient Descent method aims to find the optimal value of the weight by moving on this graph and aims to reach the lowest point of the graph. The cost function is expressed as J(θ₀, θ₁) and is tried to be minimized according to the parameters θ₀ and θ₁. For this purpose, first partial derivatives are calculated according to the relevant parameters. Then, the weight values are updated with the negative value of these partial derivatives. This process is called “Update Rule”. The reversal process takes place according to the alpha value called “Learning Rate”, which can be set by the user.

A large Learning Rate (α) value may lead to sudden and extreme changes, which carries the risk of the algorithm quickly going in the wrong direction. On the other hand, if the Learning Rate is too small, we progress towards the goal very slowly, which requires too many updates, which can lead to wasted time. Choosing the most appropriate Learning Rate helps us reach the minimum in the fastest and most accurate way.

--

--