My big discovery

As a young student memorizing equations named after famous scientists I would fantasize about some day making a discovery and naming it after myself. Well, I finally made an important discovery, but I didn’t name it after myself. Let me tell you about it.

I found a way to determine if a reported measure of variability is mathematically possible.

What does that mean?

It means I can take a statistic like a standard deviation of any size and reported to any number of decimal places and tell you if it is reported correctly.

How is this possible?

I found that variances of discrete data follow a “simple” pattern. On top of this I found that the averages of the data sets also follow a pattern and only match certain variances, and I named the consistency test that utilizes these facts the GRIMMER (Granularity-Related Inconsistency of Means Mapped to Error Repeats) test.

Am I sure I am the first person to discover this?

My friends would ask me this incredulously every time I showed them my discovery. And I don’t blame them. You would think that we would know everything there is to know about variances since the term can be found in the literature dating back to 1918. But my searches couldn’t find any evidence that this phenomenon has been described before.

Student of Student’s t-test fame did describe a pattern for standard deviations for a sample size of 4 in 1908, and he even notes that a given standard deviation only matches certain means. However, I could not reproduce his findings and he did not describe patterns for any other sample sizes.

And the GRIM test received quite a bit of publicity and no one came forward to claim it has already been published, and the GRIMMER test is much more complicated than the GRIM test, so I think I’m in the clear.

And besides, even if the patterns have been described before, I’m pretty sure no one has utilized them to develop a consistency test. Everyone knows that means of discrete data are granular, and in fact Student in his 1908 paper noted that with a sample size of 4 and integer data the granularity is .25. And yet no one had utilized that fact until the authors of the GRIM test came along.

So how did I make my discovery?

It all started right here on Medium. I read James Heathers’ post about the GRIM test and subsequently added the test to PrePubMed. I then discovered the other author of the GRIM test was in town and invited him over to my place for some bbq. Nicholas Brown told me he was trying to extend the GRIM test to standard deviations but that it was computationally intensive to enumerate all the possibilities.

I was very fascinated by the math behind the GRIM test and thought the test was very clever, so I thought I’d try my hand at making a GRIM test for standard deviations.

I went through stacks of scratch paper trying to understand standard deviations and quickly developed delusions of grandeur that I’d be able to find a simple equation for checking standard deviations. I did discover some interesting properties of standard deviations, but ultimately I was overwhelmed by their complexity.

I resigned myself to basically following the strategy Nicholas Brown was using, a brute force approach. I figured I would just enumerate all the possibilities and store the results. I knew this would be computationally intensive, but I have a lot of experience working with large data sets and using computing clusters so I felt I was in a good position to complete this project.

I then went about trying to figure out how to be as efficient as possible. I discovered that if I wrote my own functions they were much faster than NumPy or Python’s built in stats library. I also decided that to speed up the process I would calculate variances instead of standard deviations.

And this turned out to be the key to everything. Even though I had been previously defeated in my attempts to write an equation for standard deviations, I thought that if I had some data to look at it might give me a hint at what an equation might look like.

So I took a look at the sorted variances and it was quickly apparent that there was a pattern. But even with these patterns in hand I still could not develop an equation that described them. Regardless, I knew that the patterns were just as good as an equation, except for the fact that they had to be empirically determined for each sample size.

And so I started on an odyssey of identifying patterns and learning about all kinds of floating point issues along the way. After my arduous adventures I was determined to have a publication worthy of such an important discovery, so I picked up some LaTeX and you can read the paper here.

This should be seen as a huge victory for preprints. The GRIM test is still not published, and yet because the authors preprinted their work I was already able to complete a follow up publication of the work.

I also can’t help but mention this discovery was only possible by leaving my MD/PhD program. I’d rather have my name in the history books than have a name with a couple extra letters next to it that is quickly forgotten.