A tutorial on how to curve/data fit a set of data points using Least Squares Fitting in GNU Octave

4 min readJul 2, 2018

Given a set P containing data points (x,y), how would you form a mathematical function such that it fits all data points and it models the relationship between the variables?

This problem would be a lot easier if the |P| (the size of set P) = 2 or 3 as we would’ve formed a line or quadratic equation respectively; however in real life, data sets usually contain h̶u̶n̶d̶r̶e̶d̶s̶, t̶h̶o̶u̶s̶a̶n̶d̶s̶,m̶i̶l̶l̶i̶o̶n̶s̶ or even billions of data points.

Thanks to the great minds that have formed the techniques to fit even large amount of data points, we now have MULTIPLE WAYS on our hands to fit data points and use this model in practical applications. (research more about curve fitting to know more 😉)

In this article, I’ll be showing a step-by-step procedure on how to code the method of Least Squares Fitting — one of the curve fitting techniques — in GNU Octave. (Here’s the Github repository link for reference)

Before diving into the coding part let’s define the
Least Squares Polynomial Curve Fitting Theorem:

Given the n data points (x₁, y₁),(x₂, y₂), …, (xₙ, yₙ), the least squares polynomial of degree m of the form

Pₘ(x) = c₁ + c₂ x +c₃ x² + … + cₘ xᵐ⁻¹ + cₘ₊₁ xᵐ

that fits the n data points is obtained by solving the following linear system

Figure 1: Linear system we’ll solve to get the function that will fit the set of data points

The matrix equation is in the form of:
Ax = b

Procedure:

Install GNU Octave

sudo apt-add-repository ppa:octave/stable
sudo apt-get update
sudo apt-get install octave

2. Be familiar with the data set used. We’ll be using the population table below. Note that the independent variable and the dependent variable is the relative year (time) and population respectively.

3. Following the Least Squares Polynomial Curve Fitting Theorem, setup the corresponding linear system (matrix) of the data set.

In the code above, we build the matrices A and b corresponding to the formula given in figure 1.

Figure 2: Corresponding Matrix Setup of the data set from Figure 1 (note that n = size of | relative # of years | and m=3 <3 is arbitrary>)

Output:

Curve Fitting
1.) Matrices formed in least-squares polynomial curve fitting
A matrix:
A =   5.0000e+00   1.0000e+02   3.0000e+03   1.0000e+05
   1.0000e+02   3.0000e+03   1.0000e+05   3.5400e+06
   3.0000e+03   1.0000e+05   3.5400e+06   1.3000e+08
   1.0000e+05   3.5400e+06   1.3000e+08   4.8900e+09b matrix:
b =   3.1687e+02
   7.8019e+03
   2.4968e+05
   8.6474e+06

4. Perform the Cholesky decomposition on matrix A and then solve for the x vector in figure 1 (which contains the coefficients/weights of the polynomial curve fitting the data points) through left division <lines 26 & 30>. (If solved manually, you can solve y through forward substitution and x through backward substitution)

Note: We use Cholesky decomposition because matrix A is a Hermitian, positive-definite matrix.

The equations above are the formulas in getting the x vector.

Output:

2.)The following cubic polynomial interpolates the data points:
-0.00027083*x^3 + 0.023229*x^2 + 0.95244*x^1 + 35.805
where:
    a_0 = -0.000271
     a_1 = 0.023229
     a_2 = 0.952440
     a_3 = 35.804714
SOLUTION: Cholesky LDLt
L =       1       0       0       0
      20       1       0       0
     600      40       1       0
   20000    1540      60       1D =Diagonal Matrix          5          0          0          0
          0       1000          0          0
          0          0     140000          0
          0          0          0   14400000L_transpose =       1      20     600   20000
       0       1      40    1540
       0       0       1      60
       0       0       0       1Ly = b // below is the solved y
y =
   316.87
   1464.50
    977.00
  -3900.00D(Lt)x = y  // below is the solved x
x = 
   3.5805e+01
   9.5244e-01
   2.3229e-02
  -2.7083e-04See the first graph containing the cubic polynomial which fits the data points

Figure 3: Graph of the cubic polynomial curve fitting the data points

Population(year) = -0.00027083*x^3 + 0.023229*x^2 + 0.95244*x^1 + 35.805

5. [Optional] Now that we have generated the curve that fits the data points, let’s use this to compare the interpolated population from the years 1975, 1985, 1995, & 2005 (relative years from 1970 = [ 5,15,25,35]) to the real values as seen in the table below.

Link to code : Check lines 108–112

Output:

3.) Interpolation the population in years(5,15,25,35) using the model VS real value:
interpolated_values =41.114   54.404   69.902   85.983real_value =41.290   54.320   69.830   86.270See the second graph containing the cubic polynomial interpolating the data points

Figure 4: Graph of the polynomial curve with the extra data points for years (1975, 1985, 1995, & 2005)

Observe that the error/difference between the interpolated values using the model and the real value is a matter of decimal point. This is only the case here as we have a particularly small data set (which has data points that aren’t that erratic) and note that this may not apply to other data set.

Take a look at this graph I generated when I used least square fitting to another data set:

Here’s the end of the tutorial. 👌
Thank you!

A tutorial on how to curve/data fit a set of data points using Least Squares Fitting in GNU Octave

In this article, I’ll be showing a step-by-step procedure on how to code the method of Least Squares Fitting — one of the curve fitting techniques — in GNU Octave. (Here’s the Github repository link for reference)

The matrix equation is in the form of:Ax = b

Written by Rich Yap

The matrix equation is in the form of:
Ax = b