An Introduction of Degrees of Freedom In Machine Learning and Statistics.

Shivam Mishra

Published in

Analytics Vidhya

3 min readJul 12, 2020

Importance of Degrees of Freedom In Machine Learning and Statistics.

INTRODUCTION

Degrees of freedom is an important concept from statistics and Data Science(like Machine Learning).

It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test.

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

In short, It represent the number of points of control of a system, model, or calculation.

Each independent parameter that can change is a separate dimension in a d-dimensional space that defines the scope of values that may influence the system, where the specific observed or specified values are a single point in that space.

Mathematically, the degrees of freedom is often represented using the Greek letter nu, which looks like a lower-case “v”.

It may also be abbreviated as “d.f” or simply “df”.

Degrees of Freedom in Statistics

In statistics, the degrees of freedom is the number of values used in the calculation of a statistic that can change.

Degrees of freedom: Roughly, the minimum amount of data needed to calculate a statistic. More practically, it is a number, or numbers, used to approximate the number of observations in the data set for the purpose of determining statistical significance. -Jason Brownlee

degrees of freedom = number of independent values — number of statistics

Most commonly used formula is:

df=N-1

For example :-

If we have 100 independent samples and we want to calculate a statistic of the sample, like the mean. All 100 samples are used in the calculation and there is one statistic, so the number of degrees of freedom for the mean, in this case is calculated as:

df=N-1

df=99

Degrees of freedom is very important in data distributions and Statistical Hypothesis Test.

Degrees of Freedom in Machine Learning

In predictive modeling, the degrees of freedom often refers to the number of parameters in the model that are estimated from data.

Let’s learn it with the example of Linear Regression.

Let’s consider a two variable linear regression.

yhat = x1 * beta1 + x2 * beta2

This linear regression model has two degrees of freedom because there are two parameters in the model that must be estimated from a training dataset. Adding one more variable to the data would add one more degree of freedom for the model.

model degrees of freedom = number of parameters estimated from data

It is common to describe the complexity of a model fit from data based on the number of parameters that were fit.

There are many other regression concepts where we used the degrees of freedom.

It is also used in Deep learning Neural Network.