I previously wrote a post, “Crap spotted at bioRxiv”, on 3 preprints which are the worst scholarly work I’ve ever come across. While looking through the preprints I index at PrePubMed I noticed another series of 3 preprints (not the same authors as before), but this time at F1000Research. If the name F1000Research sounds familiar that’s because they were the journal at the center of a plagiarism scandal, and they allow authors to publish tools based on promised features that never get added.
I get bombarded by crap research every day, and I don’t typically take the time to blog about it, or even tweet about it, but I think looking at these papers is instructive.
The most obvious problem with these papers is that they are a textbook case of salami slicing, which is turning what should be a single publication into several smaller publications. Salami slicing does not necessarily mean the publications will be low quality, but the practice is problematic for multiple reasons. First of all, salami slicing is suggestive of p-hacking, specifically that they tested a bunch of different hypotheses and published whatever was significant (as individual papers). We saw this with Brian Wansink when he turned one pizza data set into four publications, and the p-hacking was confirmed by his own blog post, and his own emails.
However, this series of publications was on a clinical trial, which had to be registered, and some of the papers did not even report a significant finding, so in this case salami slicing is not associated with p-hacking. But even in the absence of p-hacking salami slicing is a questionable research practice (QRP).
When you stretch a single data set into paper-thin publications you are just wasting everyone’s time. The authors are wasting their own time, time which could have been spent performing some actually useful research. The authors are wasting the reviewers’ time, since more sets of reviewers have to review basically identical papers. And the authors are wasting the readers’ time, since additional papers have to be read.
Upon opening these 3 papers, the small sample size in the trial immediately stands out. So we’ve got a trial with a small sample size, which then got salami sliced into 3 papers, and published in the same journal on the same day as preprints waiting to be reviewed. At this point you might be wondering “Why should I even read these papers?”. The answer is you probably shouldn’t, except for entertainment purposes, and I like to be entertained. As I’ve seen in my career as a data thug, finding one QRP often leads to finding more QRPs, so I decided to go ahead and see what I might find.
I should re-emphasize that the trial was registered, and I didn’t take a look at the raw data so I have no reason to believe there was any p-hacking or misreporting of data.
But looking at the papers I wonder why they settled on only 3 papers. The first paper could have easily been separated into a paper about weight, a paper about fat mass, and a paper about muscle mass. The second paper could have been sliced into a paper about GSH and a paper about MDA. If ya gonna slice you might as well go big or go home, amirite?
The conclusion from the first paper is literally, and I quote:
Calories deficit with either high protein or standard protein for 8 weeks brought about significant reduction in body composition
What should have just been a footnote instead became an entire publication.
I guess at this point I should summarize what the authors did. They recruited a mix of overweight and obese individuals and had them go on a diet which was either high in protein or standard in protein. At the end of 8 weeks they then compared various measures to their pre-diet levels, and compared the differences between the groups.
As I mentioned, the trial had a small sample size, sometimes very small for certain comparisons. But that didn’t stop the authors from claiming in the second paper:
In order that this study could achieve a two-tailed α of 0.05 as well as a power of 80%, 13 samples for each group were needed to detect mean differences between these groups.
This is absolutely ridiculous. 80% power with that sample size would require extremely little variation, or an extreme effect. For the MDA differences the authors used an independent t-test. Going to this calculator, and bootstrapping the SDs, you would need a difference which is around 25% of the baseline MDA levels.
The authors did a two-tailed power calculation, so they don’t even know which direction the effect will go, but they are confident that a 6% difference in protein consumption will result in a plasma marker of oxidative stress changing 25%. They are either delusional, or they are lying. Interestingly, their sample size happened to be just barely be high enough for 80% power.
Frankly, I consider presenting a misleading power calculation to be scientific misconduct. Imagine someone is actually interested in these findings (hard to imagine, but roll with me), they’ll see that these authors did not find an effect with 80% power so they’d be discouraged from pursuing this line of research. However, maybe there actually is a moderate effect, which just needs a larger sample to see.
So there we have it, a clear QRP after identifying the first QRP of salami slicing, and it really only took a couple minutes to notice.
Although I think this clinical trial was a complete waste of time, I find this series of 3 publications interesting for a couple reasons. The 3 papers were linked together by the journal, so the journal was complicit in the salami slicing. I guess it makes sense, triple the fee. If I was the journal I would have encouraged the authors to sharpen their blade. As I said, the first paper easily could have been 3 papers, and the second paper 2 papers, so there could have been 6 total papers! 6 times the fee, holla at a balla.
I’m also very curious to watch the review process play out, which is open at F1000Research. The 3 papers are basically identical to each other, so if you accept one I don’t understand why you wouldn’t accept the other 2 as well, unless you believe some of the power calculations are correct while others aren’t.
In addition to these clear QRPs, while reading these papers I noticed a large number of minor errors.
First of all, it is clear English isn’t the authors first language as the articles are filled with grammatical errors. I don’t know whether this should be the responsibility of the authors to fix or should be part of the journal’s editing services.
In the first paper the abstract claims there are 45 participants, but the figure, tables, and text show 54 participants.
In the tables the authors alternate between using commas and decimals, sometimes within the same table.
In the tables the superscripts aren’t always defined. For example, in Table 3 of article 2, the “p” superscript isn’t even red, but we know from the first article that it signifies a paired-samples t-test.
Instead of using an independent t-test to test the difference of differences, they should have used a repeated measures ANOVA.
I have an issue with the authors’ definition of weight cycling. They claim weight cycling is not defined, so they define it as having lost and regained ≥ 2 kg (4.41 pounds). Every time I go to the gym I weigh myself and my weight fluctuates by 5 pounds every couple days, so I guess I would meet their definition. LeBron James gained 7 pounds during a basketball game, so I guess it would only take him 48 minutes to meet their definition. The point is instead of studying overweight/obese individuals with a history of weight cycling, the authors essentially just studied overweight/obese individuals, who may or may not have a history of weight cycling.