In any statistical observations, a set of measured variables might be call inputs. These inputs have some influence on one or more other variables, called outputs. The goal of supervised learning is to use the inputs to predict the values of the outputs. In the statistical literature of machine learning, we have
- Inputs are often called the predictors, also known as independent variables.
- The outputs are called the responses, or classically the dependent variables.
- A function that maps or explains the relation between input and output. in supervised learning, they are well known as regression functions. They correspond to the case where the outputs are quantitative.
In the above model, we need to predict the amount of glucose of ith person using the infrared absorption spectrum of his/her blood. This prediction can be done in two different ways, mainly.
- If we have a pre-defined model, we can use a parametric approach.
- If we do not have any model, we use a non-Parametric approach
A parametric model presumes that the form of a function is known. When an experimenter chooses one family of curves and inputs the choices into the inferential process, the information that data can supply concerning the model development is then restricted from the data under this assumed parametric form. The experimenter relies less on the data for information about the model.
The assumption of the model without the knowledge of the accuracy of his or her assumptions can be dangerous in the sense of producing misleading inferences about the regression curve.
If we infer the above-given equation with unknown function while making few assumptions as possible, basically as a non-parametric model, it can be as
Why the non-parametric model?
- It allows great flexibility in the possible form of the regression curve and makes no assumption about a parametric form.
- It only assumes that the regression curve belongs to them some infinite-dimensional collection of functions.
- It heavily relies on the experimenter to supply only qualitative information about the function and let the data speak for itself concerning the actual form of a regression curve.
- It is best suited for inference in a situation where there is very little or no prior information about the regression curve.
The parametric form of regression is used based on historical data; non-parametric can be used at any stage as it doesn’t take any presumption. However, parametric and non-parametric regression is not a contradiction of each other. While the subsequent computation of non-parametric form, based on the first result can be used as a parametric form, any type of parametric regression can be used as non-parametric to test the accuracy of the model that we are using for parametric regression.
Further reading for non-parametric regression.
 Larry Wasserman, All of Nonparametric Statistics, Springer Texts in Statistics
 Randall L. Eubank, Nonparametric Regression and Spline Smoothing (1999), MARCEL DEKKER, INC.