SPRITE Case Study #2: The Case of the Polarizing Porterhouse (And Some Updates)

In my previous post, I very briefly introduced SPRITE through a worked example — the Carthorse Child.


That was the most straightforward case-study I could find, and also the jolliest, due to the heckin’ good horse/child crossover.

The case study explored the basic question: given the regular summary statistics (mean, SD and n) of a sample, what are the properties of a hypothetical sample fitting that description?

The answer in this case was if you have 45 children who take an average of 19.4 carrots (SD=19.9), then most samples which meet that description have at least one child who was served at least 60 carrots, and hence is a horse.

Lots of people had objections and points:

How can you be sure the ‘carrots’ were whole carrots?

A few reasons. I hope you find them convincing:

a) They’re just called ‘carrots’ the whole time in this research, and in lots of other similar research by the same author. Sometimes they’re referred to explicitly as ‘baby carrots’ (just not in this paper). But no other language is ever used in the paper of interest either — sliced, diced, miniature etc. In America, in my experience (as a dirty job-stealing immigrant) when people refer to ‘carrots’ they invariably mean baby carrots.

b) Here is a literal description of the study by the author — the SPECIAL SUPER SPACE CARROTS are shown in both a photo and a dish. They’re right there. Look at their orange glory.

Now, I think, I hope, that I’m being reasonable. I’ve also spent far longer thinking about carrots than I normally would. I didn’t want to be hasty, but also very much wanted something current and appropriate to finally introduce SPRITE — which I’ve been playing with on and off for ages — and then the horse-child showed up. How often do you think I have a good hook to introduce people to forensic meta-science techniques??

If I understand SPRITE correctly, can’t you solve many of the problems you list with appropriate statistical derivations? Some samples are just impossible without your loop-heavy method.

YES, you absolutely can find appropriate statistical relationships to solve SPRITE problems. (And we can go into them in detail sometime.) YES, my approach is also kind of ugly. It is also a) multi-purpose b) intuitive c) easy.

(I am working on some optimization at present.)

The thing is: I am not a statistician. I’m not even sure how to develop more arcane tools, and even if I did, I’m not sure other people would use them. In my experience, people will gravitate towards things they understand.

Do you know all the assumptions behind the shuffling method that you use? What properties does it confer your solutions? Will it generate a CLASS of answers or ALL the answers?

Very good question, not 100% sure yet. But — I have more than one method, and they converge fairly heavily on the same sorts of solutions. I will get through all the work of explaining them when I can.

Have you seen similar approach XYZ?

I have now! I have received FOUR emails about this, both from people who have worked on the problem (and all of them in different ways) and people with additional analytical suggestions. I appreciate you all sending me these things, I’ve learned a lots (even in a few days) and I want to talk to everyone involved. Please and thank you.

Your horse and carrot jokes are bad and/or inappropriate.

“The proposition before this committee is: no more horse jokes. All those in favour?”

*a few scattered ayes*…

“All those against?”


(Yes, I went there. But I’m wasting time now.)

Case Study #2 — The Case of The Polarizing Porterhouse

Today, we’re going to do something only very slightly more complicated than the carrots example, and that is a) work without an SD, instead sampling over a range of SDs to establish a plausible one, and b) introduce a simple constraint to the above.

For SPRITE to come up with useful information, constraints help a great deal. They allow us to reduce very large sets of probable numbers to much smaller ones. More or less any constraint can be harnessed to illuminate SPRITE in some way.

The following example is from the paper: “Exploring comfort food preferences across age and gender” by Wansink et.al. (2003). It presents an investigation into comfort food preferences across gender and age. The paper has 330 citations at present. Here’s the results from Table 2:

In case it isn’t obvious, this is a table of means. It concerns a massive survey of Americans, who rated whether or not presented foods qualified as ‘comfort foods’ or not. Scale is from 1 to 5 — so 1 is eaten standing up in an alleyway after committing a felony, 5 is eaten under a duvet at home watching Netflix.

Ignore the other mistakes, let’s talk about meat (‘steak or beef’).

Steak, for men, has a mean of 3.2, is … I guess this means it’s ‘sort of’ comforting.

And for women, a mean of 2.8, which is ‘slightly less than sort of comforting’.


According to the table, when you compare those two values with a one-way ANOVA, you get a meaty F value of 17.8. In this instance, the ANOVA returns an F-value exactly equivalent to a t value, t² = F. In other words, a regular old t-test on the same groups would return a t-value of 4.22. That’s a big old difference.

So, for convenience sake, let’s start by assuming identical SDs between these groups, and back-calculate the SDs.

(We don’t need SPRITE yet, but we can use it to reproduce the above exactly — just run it without the scale limits, and change the SD iteratively until we get to the solution SD=1.47, which gives an t of 4.2179… lovely!)

So, now our assumed dataset is:

Men: mean = 3.2, SD = 1.47, n = 401

Women: mean = 2.8, SD = 1.47, n = 602

The only problem with that is… ?

Trick question. It’s 100% A-OK.

Here is an example SPRITE solution for each:

Well, that’s the end of that, then.

Right? Course not.

In fact, the paper goes on to put its foot in it completely:

Another way to examine the general tendency for males and females to rate comfort foods differently is to construct a surrogate measure of percentage acceptance by coding people who rated a food as 4 = agree or 5 = strongly agree as someone who accepts the food as a comfort food [93]. In doing this, it is found that females had a higher acceptance percentage of candy and chocolate [69% vs. 58%; v2 = 4.8; P In doing this, it is found that females had a higher acceptance percentage of candy and chocolate [69% vs. 58%…] but a lower acceptance percentage of meal-related foods such as steak or beef [52% vs. 77%].

And that’s our constraint.

Let’s start with men: we have to take those rather flat distributions above and somehow shove 77% of the individual values into the 4 or 5 bins.

Is that even possible?

Well, the mean is below 4, so in our most constrained possible case, there would be no 5’s. Obviously, this is silly, because a lot of American men have what approaches a steamy romance with steak, but we’ll let that pass for now.

If we run SPRITE on just the values from 1 to 4, we get the maximum possible solution, and it is truly bizarre:

That is the literal best the data can do — nothing but 1s and 4s. The mean is still where it has to be (3.2) but even in this totally loopy and utterly extreme case, we only have a standard deviation of 1.33.

But you might remember from before, we made that SD up. So violating it isn’t such a problem. What IS a problem is that this is the absolute best case for getting multiple values above 4, and it isn’t 77% — it topped out at 73.3%.

The female data, however, can exist. SPRITE pops it right out.

… It’s just funny-looking.

Deeply silly-looking, yes, but we did in fact get what we came for — fully 59.4% of the values are 4 or 5 (a.k.a. all 4s), over the 52% threshold.

So, instead of maxing out our distribution, let’s mandate that exactly 52% of our sample to be 4s. This is easy to do with SPRITE: we need to start with 313 4s, and can find the sample parameters when we split the remaining sum amongst the remaining n.

Specifically: we now need to find a sample of n=289 with a mean of 1.5, and when we add it to the huge stack of n=313 4’s, we get our overall mean.

So, here are two solutions:

Now, the first case above will give the maximum SD (it’s all about the 1s), and it’s only a hair over our assumptive value we started with: SD=1.39.

However, we are now faced with the prospect of a sample with a punishingly small number of 3’s, and no 5’s whatsoever. Do you think these distributions, or similar ones where you shuffle the values from 1–3, are realistic? They aren’t. But they’re the only ones that work.


The ‘men’ case is impossible as stated, the ‘women’ case is simply monumentally unlikely.

Even with steak being the most bizarrely polarizing foodstuff ever, the forment of wars and sedition the world over, splitting people into a kind of culinary Atreides and Harkonnen (showing my age here), there is no possible way to reconstruct the numbers given in the paper.

There are other inconsistencies in the paper, but they will have to wait.

P.S. Next time, we’ll look at something a bit more complicated.