Non Linear Regression

Mehmet Toprak
4 min readApr 2, 2020

--

In this blog, I’ll be covering non-linear regression basics. Non-linear regression is a method to model a non-linear relationship between the dependent variable and a set of independent variables

So, let’s get started.

These data points correspond to China’s gross domestic product or GDP from 1960–2014.

The first column is the years and the second is China’s corresponding annual gross domestic income in US dollars for that year. That is what the data points look like. Now, we have a couple of interesting questions.

First, can GDP be predicted based on time?

Second, can we use a simple linear regression to model it?

Indeed. If the data shows a curvy trend, then linear regression would not produce very accurate results when compared to a non-linear regression. Simply because, as the name implies, linear regression presumes that the data is linear.

The scatter plot shows that there seems to be a strong relationship between GDP and time, but the relationship is not linear. As you can see, the growth starts off slowly, then from 2005 onward, the growth is very significant. Finally, it decelerates slightly in the 2010s. It looks like either a logistical or exponential function. So, it requires a special estimation method of the non-linear regression procedure.

Many different regressions exists that can be used to fit whatever the dataset looks like. You can see a quadratic and cubic regression lines here, and it can go on and on to infinite degrees. In essence, we can call all of these polynomial regression, where the relationship between the independent variable X and the dependent variable Y is modeled as an Nth degree polynomial in X. With many types of regression to choose from, there’s a good chance that one will fit your dataset well. Remember, it’s important to pick a regression that fits the data the best.

What is polynomial regression?

Polynomial regression fits a curve line to your data. Thetas are parameters to be estimated that makes the model fit perfectly to the underlying data. Though the relationship between X and Y is non-linear here and polynomial regression can’t fit them, a polynomial regression model can still be expressed as linear regression.

Given the third degree polynomial equation, the model is converted to a simple linear regression with new variables . This model is linear in the parameters to be estimated, right? Therefore, this polynomial regression is considered to be a special case of traditional multiple linear regression. So, you can use the same mechanism as linear regression to solve such a problem. Therefore, polynomial regression models can fit using the model of least squares. Least squares is a method for estimating the unknown parameters in a linear regression model by minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function.

what is non-linear regression exactly?

First, non-linear regression is a method to model a non-linear relationship between the dependent variable and a set of independent variables.

Second, for a model to be considered non-linear, Y hat must be a non-linear function of the parameters Theta, not necessarily the features X. When it comes to non-linear equation, it can be the shape of exponential, logarithmic, and logistic, or many other types. As you can see in all of these equations, the change of Y hat depends on changes in the parameters Theta, not necessarily on X only. That is, in non-linear regression, a model is non-linear by parameters. In contrast to linear regression, we cannot use the ordinary least squares method to fit the data in non-linear regression. In general, estimation of the parameters is not easy. Let me answer two important questions here.

First, how can I know if a problem is linear or non-linear in an easy way?

To answer this question, we have to do two things. The first is to visually figure out if the relation is linear or non-linear. It’s best to plot bivariate plots of output variables with each input variable. Also, you can calculate the correlation coefficient between independent and dependent variables, and if, for all variables, it is 0.7 or higher, there is a linear tendency and thus, it’s not appropriate to fit a non-linear regression. The second thing we have to do is to use non-linear regression instead of linear regression when we cannot accurately model the relationship with linear parameters.

The second important question is, how should I model my data if it displays non-linear on a scatter plot?

Well, to address this, you have to use either a polynomial regression, use a non-linear regression model, or transform your data.

Thanks for reading.

--

--

Mehmet Toprak

Data Scientist | Machine Learning Engineer | Data Engineer