Mainly on the Plane

Alex Hallam
10 min readDec 28, 2016

--

The motivation behind this article was inspired by a comment I saw on twitter,

“Statistical software long ago democratized statistics by making it easy to run t-tests, but people are still wildly confused about the math.”

I began thinking about this “wild confusion”. It is my guess that when people learn statistics from the classroom or a book they learn an algebraic approach. Though this is one way to approach statistics, perhaps someone new to the field will see statistics as a convoluted bags of equations. For this reason it might be fun to dust off the old geometric approach to statistics. I believe that if you know some geometry and linear algebra you have access to the deepest and most fundamental ideas in statistics.

Why Geometry

Why use geometry in statistics? Won’t this make a difficult subject even harder? My answer is no.

  • Geometry provides clarity by displaying a summary of the problem in the form of a picture. In many instances a picture is easier to remember than a set of algebraic formulas, which statistics has a reputation for collecting.
  • Geometry can also serve to unify topics that seem to be quite different such as aspects of t-tests, ANOVA, and regression. Which, for the sake of brevity, I will not be able to cover in this article. The interested reader can learn about these connections from the references at the bottom of this article.
  • Geometry may have been the way in which statistical tests were originally developed. It has been suggest that R.A. Fisher derived his statistical insight from thinking in geometric pictures.

This approach may not be for everyone. For me, geometry provides a picture to compliment sometimes terse formulas.

Objectives

I want to walk through a statistical argument. Lets start with a question.

Question: Are short chopsticks more efficient than long chopsticks?

We will walk through this problem twice.

  1. The first will be the “democratized” approach, a one line command that gives you an immediate answer.
  2. The second way will be the geometric approach. This will take a considerable amount of time — much more time than the one line command in R. We will start the problem with just two samples because the dimensions are low and pictures are easier to draw.

Problem Set Up

An experiment published in Applied Ergonomics (data found here) had a group of volunteers pinch food with 5 different chopstick lengths ranging from 180 mm to 330 mm and recorded their efficiency.

The y-axis is food pinching efficiency. The x-axis is chopstick length in millimeters.

For our purposes lets compare the food pinching efficiency of just two groups. How about the 180 mm (red) and 240 mm (green) groups. Just from visual inspection it seems that the 240 mm group maybe slightly better, but it is hard to know for sure.

Here is the data:

To summarize the data, both groups had 31 subjects. The average efficiency of the 240 mm group was 26.3. The average efficiency of the 180 mm group was 24.9. The difference between the means of the groups is 1.4. Now we have a more specific question than the one previously stated.

Is an increased efficiency of 1.4 sufficient evidence to say that 240 mm chopsticks is more efficient at pinching food than 180 mm chopsticks?

We frame this question as a debate between two people. One person, the skeptic, says that perhaps 1.4 is not enough of a difference to say that 240 mm chopsticks makes you more efficient.

The other person, the advocate, says when compared to the variation in the sample 1.4 is enough of a difference to say that 240 mm makes you more efficient and that a difference that large is rare.

Resolving the debate with one line of code

The t.test()function in R tells us everything we need to know. The results confirm the advocates argument. The 95% confidence interval does not includes 0. The t-value is greater than the critical value of 1.69. The p-value is less than 0.05. There we have it. Another statistical argument resolved! The problem is that there is not a lot of intuition about what happened.

Resolving the debate with Geometry

The key idea of the geometric approach is the following.

Compare the length of the projection of the observation vector onto the direction of the hypothesis with the average of the lengths of the projections of the observation onto the orthogonal direction in the error space.

If that made no sense then you are in the right place! Most people are not trained to think of statistics like this. We will break the above statement down into pictures.

As stated above, we are comparing just the chopsticks of length 180 and 240 mm. Below is a list of the data. There are 4 columns:Individual, Efficiency.240, Efficiency.180, and diff (the difference between the groups).

Food pinching efficiency for chopsticks of length 240 and 180. Sample 1 and 16 food pinching efficiency difference highlighted in red.

Lets take two individuals (samples). How about 16 and 1. The corresponding differences are 1.08 and 1.79. These values make our observation vector

y=[1.08,1.79]`

Where the back tick ` represents the transpose. This can be represented as a vector.

Now, we draw the direction of our hypothesis. The direction of the hypothesis is an equiangular (45 degrees) line from the origin. We are literally drawing the argument of the advocate. If, the difference in means between chopstick efficiency is significant it should be both in the direction of the advocates argument and be far enough away from the origin (the skeptics argument).

The dashed line is in the direction of the unit vector U1. This vector is a necessary part of forming our first projection vector.

Now, we are going to project the observation vector onto the direction of the advocates argument.

Orthogonal projections, are a linear algebra topic. Visually, we imagine a light source perpendicular to the vector we are projecting onto. The shadow cast on the vector is the projection. In our case we imagine a light source perpendicular to the dashed line. A shadow is cast by the observation vector onto the dashed line. The shadow is the projection vector. We will call this vector the model vector.

The model vector is the result of the observation vector being projected onto the unit vector U1.

Now we have a picture of the model vector, which is the result of the observation vector being projected onto the unit vector U1.This gives the observation vector and the model vector.

Notice that each element in the model vector is the average of the values in the observaton vector.

Now we need to quantify the error. To do this we will use another orthogonal coordinate U2. This vector is perpendicular to the model vector.

Once again we project the observation vector onto the orthogonal coordinate.

Moving the tail of the second projection vector to the head the model vector gives the following “statistical triangle”.

A visual summary of this triangle building process is below.

Earlier we stated that we need to, “compare the length of the projection of the observation vector onto the direction of the hypothesis with the average of the lengths of the projections of the observation onto the orthogonal direction in the error space”. Hopefully the picture above is explaining what that means. Roughly speaking, we are simply comparing the length of the model vector against the length of the error vector.

If the skeptic is correct the model vector and the error vector will be similar. If the advocate is correct the model vector will be significantly larger than the error vector.

Statistical Tests

The algebraic representation of the t-test statistic for paired samples would look something like this:

Algebraic: t test statistics

Lets contrast that with a formula built with the our statistical triangle.

Geometric: t test statistics using the model vector (A) and error vector (B)

Where A is equal to the length of the model vector, B is equal to the length of the error vector, and q is the degrees of freedom — which is equal to the number of orthogonal coordinates that we needed to build the error vector. Since we just used two samples q = 1 because we used one of the samples for the observation vector. The plus/minus sign represents the directionality of the observation vector.

This puts us in a position to address the debate between the skeptic and the advocate.

If the advocate is right then the projection length, model vector, should be considerably greater than the error vector. In others words A >> B.

If the skeptic is right, then the difference between chopsticks is 0, then A should be similar to B.

Two samples

Lets plug in the two samples from our chopsticks example to show that the two formulas give the same output.

t test statistic seen in your average textbook
t test statistic derived from geometry

At this point we would have to look up the t-value in a table to see what the p-values is. As a side note, for the interested reader, you can use an alternative statistic that takes the angle between the observation and the model vector to calculate the p-value directly! This technique is found in Graham R. Wood, 2002.

If we have a table or software handy we will see that we need a t-value greater than 6.31 if it is acceptable to have a 5% probability of incorrectly rejecting the null hypothesis (Type I Error). So with just two samples we would accept the skeptics argument and say that there is not difference between the two chopsticks food pinching efficiency. Of course, two samples is a small sample size so lets start increasing the number of samples.

Three samples

Now lets look at three samples. Lets add sample 24. So now we are testing to see if there is a significant difference between food pinching efficiency with three samples. We could draw a three dimensional graph with each axis representing another observation. Our equation is still the same.

With three samples the t-value needed to become significant would have to be greater than 2.92. With these three samples our results become significant since 6.72 > 2.92. We now reject the skeptics argument and say that the two chopsticks food pinching efficiencies are significantly different.

All 31 samples

Finally, lets take a look at the general case with all samples:

You will notice that this is the same value as we got using the t.test command. This is also significant as the t-value needed with 31 samples to be significant is 1.697. Since 2.24 > 1.697 we reject the skeptics aregument.

Now lets do a quick derivation to show that the geometric approach is equvalent to the standard approach:

where n is the sample size, s is the square root of the sample variance, and the numerator is the sample mean.

Remarks

This little post was just to give the reader a taste of the geometric approach to statistics. An astounding amount of information was left out. I was not able to address the breadth of the the geomtric approach. The statistical triangle we built is used, albiet with some adjustments, in ANOVA, independant t-tests, regression, and more. I was also not able to provide enough depth to give readers a working knowledge of the geometric approach especially in how to find the orthogonal coordinates of the error vectors. For those who would like to explore these ideas more please see the references below.

**Code used for this post can be found on my github.

References

Papers

Geometry, Statistics, Probability: Variations on a Common Theme

A new angle one the t-test

Books

Statistical Methods: The Geometric Approach — The most complete book on the geometric appoach to statistics. Covers a wealth of material.

Statistical Methods A Geometric Primer — This is the sister book to the above text. It was written for people who want an small introduction to geometric topics, but did not want to dive into a larger text. It covers much less than the original, but gives readers a working knowledge of the techniques presented.

Foundations of Linear and Generalized Linear Models (Chapter 2.2 and 2.3) — This is not a book dedicated to geometric statistics. As the title states it is a book on generalized linear models. He does have some sections on the linear algebra associated with linear models. In my opinion after you have a solid foundation of geometric representations of statistics the linear algebra representation seems to make much more sense. The author, Alan Agresti, is known for writing great books on statistics.

*The books are amazon affiliate links.

--

--

Alex Hallam

Life is uncertain; always eat your dessert first. @alexhallam6_28