Introduction to Splines

Kiwi
4 min readApr 9, 2022

--

How do we fit data in an appropriate way? As a student majors in statistics, it comes to the linear regression. In simple linear regression, we use all data points and get the straight line through the scatter plot. Yeah, it is really simple and pretty reasonable. Therefore, I want to introduce to subtle complex methods about numerical computation for interpolation in our data called “Splines”.

Splines Appearance

Splines is a method for searching the most appropriate trajectory given the data. It just like a craft man’s tool, a flexible thin strip of wood or metal, used to draft smooth curves. Several weights would be applied on various positions so the strip would bend according to their number and position.

Definition of Splines

Assume that the unknown function f is represented by a spline function with fixed knot sequence and fixed degree d.

This is generalized form in spline. We come up with many different basis to form distinct splines with some special properties. It really resembles our familiar linear regression, but it doesn’t limit to the linear in coefficient in fitting equation.

Truncated power basis

Truncated power series basis is defined by

The subscript + means the indicator function when the value below 0 is set to 0 and tau 1 to k means we choose some representative points called knots from data. Fixed degree d is the equation degree. We usually set d = 3 and knots = 3. Then, our cubic splines equation is

After computer algorithm, we succeed in getting function coefficient. Here, we use all knots to estimate and the property is called local. Thus, the knots we choose are pretty important. Once our dots include some outlier, our estimation would not be like the former one. It has “numerical instabilities” problem.

B-spline

The B-spline basis is based on the knot sequence

d is degree of splines function. K is the number of knots. Inner knots are knots with subscript from d+2 to d+K+1. Boundary knots are knots with subscript d+1 and d+K+2. If we choose the [2, 5, 6, 8](2nd, 5th, 6th, 8th points sorted by value). Then, inner knots are 5 and 6; boundary knots are 2, 8. The specialty in b-spline is the knots before and after boundary knots. A common strategy is to set two ends points (from 1 to d and from d+K+3 to 2d+K+2) equal to the boundary knots. If the inner knots and boundary knots are choose to be equidistant, all knots my be placed in the same distance. For example, we choose [6, 7, 8] and set d = 3. B-spline would automatically fill the points such that knots become [3, 4, 5, 6, 7, 8, 9, 10, 11] and the basis is calculated by

Its recursion begins as

B-splines have the advantage that the basis functions have local support. That is, it has numerical stabilities. The part fitting line is decided by the knots near it.

Why Do We Use Splines

Simple linear regression is really straight forward and easy to interpretation for the relationship between x and y. Once we choose to use splines, we need to know what is knots and basis and learn some complex setting. The fact is splines approaches our lives more. First, in the real world, data has more noise than we had thought. If we adopt to fit the data with simple linear regression, it would calculate with the outliers. The line would be dominated by outliers. Second, big data brings a lot of information. It could not easily interpret with a line, because the truth is many complex relationship exist. Thus, splines could explain more detail or find some special pattern in your data.

The display of splines

Now, I could not find out some packages about interpolation with truncated power basis in Python. I have tried B-spline and natural cubic for displaying different interpolation effect. As you can see outlier would be ignored with switching to natural cubic spline or B-spline.

Reference

A review of spline function procedures in R, Aris Perperoglou1, Willi Sauerbrei2, Michal Abrahamowicz3, Matthias Schmid4 on behalf of TG2 of the STRATOS initiative

--

--

Kiwi

Welcome to my Medium. The professionalism is maintaining 60% of your best performance even in bad situations. So, keep on writing and sharing.