Python for Sport Scientists: Descriptive Statistics Part 2 — Standard Deviation & Variance

John Cothran
4 min readJan 17, 2018

Note: This is Part 2 of a series on descriptive statistics using Python. If you didn’t read the first one, feel free to catch up here: http://bit.ly/2B6Qa63

Part 1 of this series essentially introduced the idea of writing functional and compose-able Python code using the mean and median functions. I must emphasize that Python has libraries that do this work for you (in this case you would just include from statistics import mean, median at the top of your code), but these are exercises that can lay the groundwork for doing some pretty special things later on.

Part 2 will demonstrate more clearly how we can build on the functions we’ve already written to compose new, more powerful functions. As you may have guessed, this part will focus on two new calculations: Variance and Standard Deviation.

These two calculations are closely related to each other. For example, variance is defined as the square of the standard deviation (or standard deviation is the square root of the variance). With compose-able functions, this is almost too easy!

Variance

We can start with variance, since all we will have to do is to square it to get the standard deviation. The mean function we defined in Part 1 will also come in handy, as well as Python’s native len function (for length).

http://www.statisticshowto.com/wp-content/uploads/2013/09/Variance_Formula.png

Looking at the equation for sample variance and considering that we will be squaring it to get the standard deviation, it seems like a good idea to go ahead and define a function that squares a value:

# Python 3
def square(x):
return x * x
square(4)
#16

The next bit is slightly trickier, but we can evaluate the numerator with a function called, say sumOfSquaredDifferences:

def sumOfSquaredDifferences (arr):
xBar = mean(arr)
differences = map(lambda x: x - xBar, arr)
squares = map(square, differences)
return sum(squares)
sprintEfforts = [88, 56, 51, 34, 50, 22, 61, 79, 90, 49]sumOfSquaredDifferences(sprintEfforts)
# 4444

This function essentially evaluates the differences of each of the values and the mean, then squares the differences and finds the sum of the squared differences. Simple enough! We could break this function into smaller pieces, instead of including the lambda function, but it’s good enough for me.

Now finding the sample variance is easy:

def variance (x):
n = len(x)
return sumOfSquaredDifferences(x) / (n-1)
variance(sprintEfforts)
# 493.7
I like memes (credit https://memegenerator.net/img/instances/500x/57954928/variance-variance-everywhere.jpg)

Standard Deviation

The variance is great an all, but we would like a measure that is a little more familiar to us. Luckily, calculating Standard Deviation from Variance is a breeze, since it is simply the square root of Variance:

def sqrt (x):
return x**(1/2)
def stDev (x):
return sqrt(variance(x))
stDev(sprintEfforts)
# 22.2

Other than Python’s wacky syntax for the power operator (**), this is super easy. Let’s apply it to the dataset of GPS scores from Part 1:

data = [
{"name": "John", "distance": 5602, "high-speed-running": 504},
{"name": "Mike", "distance": 5242, "high-speed-running": 622},
{"name": "Chad", "distance": 4825, "high-speed-running": 453},
{"name": "Phil", "distance": 611, "high-speed-running": 500},
{"name": "Tyler", "distance": 5436, "high-speed-running": 409}
]
stDev(list(map(lambda x: x['high-speed-running'], data)))
# 79.62

Closing thoughts

Combining small, pure functions that have a single purpose is proving to be useful in bringing basic statistical concepts to life. Statistical calculations are perfect for practicing good programming principles.

I’ve used map quite a bit, along with lambda functions. These are foundational for working with lists of data, and are even useful in the real world working with libraries like Pandas data-frames. It is a good idea to try to master and understand the higher order functions like map, filter, and reduce in any language in order to most effectively build compose-able, functional code.

I recommend importing libraries that do the heavy-lifting for you (in the case of this series, use import statistics). With functions like mean and stDev out of the way and taken care of, we can explore some more interesting programming concepts to solve more complex problems.

Note: Thank you for reading the second part of my series on Descriptive Statistics. There are many ways to apply these concepts to describe a data-set, and these functional concepts are pretty effective in doing so. Again, if you missed Part 1, please read it here: http://bit.ly/2B6Qa63

Check out Part 3!

--

--