Notes On Anomaly Detection #2: Full Fat or Low-Calorie Fraud?

Last time I outlined why plagiarism is dumb.

Guess what else is dumb?

Data fabrication is dumb.

Let’s get to the specifics.

First, I am reasonably convinced that scientific data ‘fraud’ is fairly rare.

By ‘fraud’, I mean the old-school kind: sit in your basement, lick your pencil and then jot down fictitious numbers which then are represented as scientific measurements. And then, well, what do you know — all those pesky numbers that never quite behave themselves are suddenly in the right place. Slap the calculations around a bit until everything fits just the way you like it, foist it on some unlucky co-author, publish. The way Diederik Stapel did it.

In some respects, this sounds easy. What could be more straightforward (and faster, and cheaper) than just sitting around and knocking up some data? Think of the efficiency!

And how would you get caught if no-one saw?

In actual fact, and this is important:

  1. The above is a hideously bad way to do fraud,
  2. … so only idiots do it
  3. … and this is another reason why fraud detection techniques primarily alert us almost exclusively to the presence of idiots. **FOOTNOTE 1**

A much, much better way of doing fraud starts with actually doing the experiments and then changing what you need from there.

It’s Fraud-Lite. It’s low-calorie fraud. “2% fraud, for your figure”. I Can’t Believe It’s Not Dishonest.

I suspect (but cannot prove) that clever, dishonest people know the differences between full-fat and skim in this case. They might know it explicitly or implicitly, to a smaller or greater extent, and it may be acknowledged (“this is how we get results”) or not (“what do you mean ‘a problem with my methods’, this is how everyone does it”).

But they do know.

The reasons that this is the case range from the straightforward and bog-obvious to the counter-intuitive.

10 Reasons Fiddling > Faking

1. Inventing realistic fake data lies somewhere between challenging and impossible — depends on the context. 
It is difficult to invent realistic summary statistics, and even more difficult to invent realistic data. Data points have a sub-structure, complex inter-relationships, and mathematical implications far beyond what an author may consider when presenting them. Consider the papers of Larry Sanna — data was found to be extremely unlikely that did not have enough difference in the variation between groups (i.e. the SD of SDs was impossibly low). People who are superficial enough to invent their own results are, in general, not clever enough to see something like that coming.

2. Real (but fiddled) data has a realistic pedigree, as all other experimental activities proceed as normal
Subjects are recruited, cells are tested, drugs are administered, lab notebooks are kept, videos are recorded, tests are taken, reagents are bought, and so on. Records exist of experimental activity, and this is easily established.

“Hey, Researcher X! Did you make this up?”
“Gosh, no. It cost us a million dollars over three years.”

One of the central problems Stapel had was that he could not account for where his survey data came from. His book opens with a frantic scene, in which he is driving around the country, trying to fabricate backstories for data he previously invented:

“So, between one train station and two college campuses, I’ve made quite a lot of progress with my story. But why did I make it so difficult for myself? Why the heck did I tell people that the research had been done in Zwolle and Groningen, hundreds of miles away, instead of somewhere close to home? Is this some kind of game I’m playing? Am I trying to make it hard for myself so that I have to try extra hard to win, so I can be proud of what I’ve done…
Is this going to work? It’s still a pretty weird story. A busy professor who designs a study, and then gets into his car and drives to the other end of the country to hand out questionnaires himself in public places, alone, with no students or research assistants. Who’s going to believe something like that?”
Faking Science, Chapter 1

3. The procedure of fiddling realistic values fraudulently is very similar to legitimate problems researchers face. 
Data points may be lost or abandoned, they may be incorrectly coded or recoded, and group designation or typographical errors may exist. Researchers may often doubt the veracity of measurements they make entirely in good faith. Amplifiers fail, wires cross, ambient temperatures change, contamination occurs, patients lie and drop out, and undergraduates say weird things. A bucket of plausible deniability for massaging away ‘bad values’ exists. Even if this massaging is extensive and dishonest, it is the first cousin of an necessary and regulation task — determining if weird (or inconvenient) data represents measurement error.

4. Fiddling may not even require conscious engagement. Fooling other people runs along an uneasy continuum with fooling ourselves. 
Fraud requires deliberate misrepresentation — only in the most unusual of circumstances could a researcher accidentally invent data. But if you have a clear expectation of an outcome, unblinded data and strong pressure to produce desired or expected results … well, you may exclude datapoints from samples in a variety of ways in the service of confirmation bias. “See? I knew that was the answer all along!”

I see you, little man! You gon’ die!

5. The practice of outlier identification is non-standardised, so who’s to even say you ‘cleaned the data the wrong way’? 
If a procedure for removing outliers is reported, there are several methods/thresholds for identifying outliers which may be reported on an as-needs basis (Grubbs’ Test, z score of 3 or 3.1, 1.5 IQR, a proportion of the median absolute deviation and so on). It’s generally true to say you can pick whichever the hell one you want.

This is also indicative of a broader attitude to methodology, especially within the social sciences, where quantification is … I’ll be nice and say ad hoc. Have a careful read of this. Have a careful read of this.

6. Reporting of the above (outlier identification) is also both uncommon and/or not formally required as a condition of publication. 
The majority of scientific work is published without much information on how outlier removal / ‘dropped value’ / ‘cleaning’ procedures have been applied. I know I am always the only pedant requesting such information in my reviews. In my work, this generally means someone took an ECG and then analysed it, so my questions are straightforward: “How was this cleaned? How many heart beats were removed? How many values were utilised per person? Did you see any ectopy?”

Bleep bloop. My life in a panel.

The answer often comes back: “No to all of the above. We saw no error.”

Wrong answer! What you mean is (a) “My software did the cleaning without me looking, and I don’t know how it works” or occasionally (b) “I just left the bad values in, because I don’t know what bad values look like, and consequently my data sucks”.

We often hear about the file drawer for completed studies which find no significant results, but bad values within published studies often have their own little separate sub-compartment in that drawer. Sometimes no-one even knows that compartment is there.

7. The details can be aggressively buried. 
If you make up summary statistics which are impossible or problematic, you’ll have a marvelous time explaining how you unambiguously reported them. Put it like this: if you say “1 plus 1 is 9” then no-one needs to see your data to know you’re wrong. But, if you have real data, you can obfuscate and suppress. And of course when you DO send the data you can send a gruesome slather of numbers which are poorly formatted, unintelligible or difficult to analyse. You can send half the numbers which work.

A priming researcher prepares a dataset for sharing.

(And, of course, you can claim that the people you want the data from are operating ‘in bad faith’ and then refuse to share them at all.)

8. Fraud is the Mark of Cain. Fiddling is just for naughty boys and girls.
No-one in science has a career post-fabrication.

However, ‘lesser sins’ are decidedly more fluid. ‘Her research was shamefully amateurish but she eventually learned better’, well, she’s probably OK. She certainly can be redeemed. But ‘she decided to stop faking her research one day, apologised and is now honest’ … sorry, lady. Bad scientists get a black eye or two, but frauds are blackballed. There’s a lot further to fall, and you’re far more likely to get pushed.

9. You’re in much better company.
Regardless of the outcome, and for whatever reason, it’s also much more common to ‘fudge a bit’ than just make up your measurements: Martinson (2012) reports “falsifying or ‘cooking’ research data” about 30 times less common than such assorted naughtiness as “dropping observations or data points from analyses based on a gut feeling that they were inaccurate”. More eye-opening is John et.al.’s (2012) assertion that 62% of psychologists report “deciding whether to exclude data after looking at the impact of doing so on the results”.

Where ‘fraud due to the manipulation of real data’ starts and ‘questionable research practices due to the removal of bad observations’ begins is anyone’s guess.

10. Making up descriptive or summary statistics is dangerous. Making up data is fraught. Fiddling existing data ranges from ‘only looks a bit iffy’ to damn-near undetectable.
If you make up means, standard deviations/SEMs, F-values, p-values, t-values, Chi-squared values, and so on, then we have a few options.

First, we know how they all fit together — we know if any one value implies an error in any other value. That’s what StatCheck does, it back-calculates what it has to (from df, p-value and test statistics) and matches the stated information up. Simple and effective.

We know (subject to some restrictions) if your means can exist in the first place. That’s what GRIM does. And likewise, if your SDs and SEMs can exist.

And if your mean and SDs can exist, we can (again, subject to restrictions) reconstruct realistic versions of them with SPRITE or similar, more sophisticated methods. If you’ve made up a mean/SD pair that isn’t possible, we have a good chance of detecting that.

And the problem with all of the above is that you can’t make data up after the fact. You already chose impossible parameters, so numbers will just straight-up refuse to fit them.

Conclusion

Let’s not worry about plain old sloppiness and mistakes at all right now, and just pursue the above to its logical conclusion.

Fraud is uncommon, and pretty stupid.

However, “low-calorie fraud” happens more and matters more.

It is more common than outright data fabrication, it is harder to detect, it is possible to do unconsciously, and obviously for some people it feels entirely compatible with maintaining their self-respect. My suspicion is that is dramatically more common than the outright bollocks we often detect.

However, we still need fraud-spotting and anomaly detection techniques, because finding bad actors in science does a lot of things, and one is generate an insane amount of publicity and attention to the issue of poor scientific behaviour. It isn’t good publicity, but people like me would argue that it is entirely necessary, and damned well justified. High-profile retractions due to identified data manipulation generate more infinitely more smoke than any quiet corrigendum ever can. They have an outsized influence on justifying the proposition that the body of scientific publishing/practice is pretty ill, and requires a fairly major operation.

Trust me: say “13% of papers contain values which are mathematically unlikely” and no-one will pay the slightest bit of attention. Say “Senior professor XYZ resigned in disgrace after he came down with a nasty case of the frauds” and the world will come banging on your door.

But, and I am repeating this yet again, never forget that fraud detection just finds the people who really goofed, and there aren’t that many of them.

It does not address the vast bulk of the problem — where normal practices bleed over into bad ones, and where work which cannot be reproduced or trusted is continuously minted into the scientific consciousness.

These problems will not be solved with fraud detection techniques. They require a change in culture, in incentives, in data management — in short, a change in basic practice.

  1. “fraud detection techniques primarily alert us almost exclusively to the presence of idiots”… SO FAR. This does not mean we aren’t working on new techniques and so on. We are. Or, at least, I am.