How I lost faith in Science (and how post-publication peer-review can restore it)

Science was my hero. Tragically, I remain infatuated with the scientific method, but modern Science is also an institution, and I have learned that the institution is what matters. Here is my story:

I was accepted into the PhD program in philosophy at the University of Wisconsin-Madison because I had perfect scores on the quantitative and analytical portions of the Graduate Record Examination (GRE). It’s rare even for mathematics students to score so high, much less philosophy students. I chose the philosophy program at the UW-Madison, because its faculty included six philosophers of science, especially Elliott Sober, who was highly recommended to me by other luminaries in the field.

Why did I choose philosophy of science? I was on a mission. I had worked as an analyst for over a dozen political campaigns and Fortune 500 companies, and saw how social leaders ignore whatever data doesn’t support their favorite hypotheses. I was disturbed that decision-makers claim to apply scientific method but follow their intuitions instead. To fix the situation, I would have to change the expectations of people who consume social research. In other words, I would need to promote a new philosophy of science.

While studying at the UW-Madison, I was awarded an internship at a think tank which afforded me a chance to prove that I could produce better social science. A team of engineers confirmed the quality of my new methods and told me that my next step was to institutionalize them. I sincerely embraced the challenge to build Science as an institution, and earned a minor in artificial intelligence because I expected to use that training to automate my process.

I earned A’s in all of my seminars, but flunked-out of graduate school over my first preliminary exam. For prelims, I was required to write papers; this was to be practice for writing a dissertation. My first prelim, “Ethics for Artificial Intelligences,” was rejected on the grounds that it was outside the scope of philosophy. Its scope was to motivate and develop ethics that can be implemented by machines. The faculty determined that it didn’t qualify as philosophy because hardly any of my citations were of philosophers, and because they themselves, as philosophers, couldn’t name any relevant citations that were omitted.

Wanting to understand what I would have to do to avoid having my dissertation be similarly rejected, I appealed the decision. As I stood before the entire philosophy department, one faculty member explained that a paper about Eastern Philosophy would have been rejected for the same reasons: No one in that particular department was qualified to teach Eastern Philosophy, and it would be irresponsible to confer a PhD on a student who had no qualified teachers. That was really bad news for me because I entered graduate school to fill a gap — no qualified teachers existed anywhere in the whole world for the scholarship I intended to advance.

Perhaps a miscommunication occurred in my application to the graduate program. I intended to be respected on the basis of my own ability, not on the basis of which teachers taught me. I came to prepare myself to solve problems, but the department intended to provide only the education required to perpetuate current legacies. I might have stuck-it-out and jumped through more hoops if I thought I were on a path to become a scholar who could cross disciplines, but I no longer trusted that path. I felt like the victim of a bait-and-switch, and doubted that my hoops would ever end.

Years later, I discovered that my prelim had been discovered on the Internet before I flunked and that it became required reading for Yale’s first course on machine morality (perhaps the first course on this topic in the world). Afterwards, philosophers from Yale published entire books about why we need ethics that can be implemented by machines. I shared this validation of my prelim with a faculty member at UW-Madison who once again explained the department decision: The department needed to get me out of their program so they could free-up resources for students who wanted to perpetuate the legacies of the faculty.

Fair enough. PhD programs are for students who want to perpetuate the legacies of the faculty. They are not for the advancement of knowledge. I mention that Elliott Sober was a leader of the department to emphasize that my situation cannot be blamed on “bad faculty” — anyone who knows Elliott knows that he would not be among bad faculty. I am quite certain that every PhD program flunks anyone as idealistic as me, and that many other drop-outs produced scholarship just as valid as mine even if they were not lucky enough to get the validation I did. That’s OK because scholarship doesn’t require credentials, right?

To further advance knowledge, I wrote another paper, building on the work of the philosophers from Yale. This paper, published in a book by Springer, explained how various approaches to ethics are interdependent. It is an important paper for the social sciences because it advanced the process of identifying universal elements of sociology, much as the introduction of the periodic table did for chemistry. The social sciences should extend to any kind of society, including to alien societies or to societies of intelligent machines, just as physics and chemistry extend to all planets, and this paper demonstrated how that could be possible.

I took my new paper to a professor in the UW-Madison School of Business. Like a good scientist, he insisted that my theory should be tested and that a good first step would be to develop a survey instrument that could categorize humans into these interdependent types. He also insisted that I do that work myself.

This was another disillusioning experience for me. Somehow, I expected professional scientists to be compelled to test any reasonable hypothesis I brought forth. I expected to find some sort of suggestion-box system which would route my hypothesis to whichever academic is responsible to test it. If true and important hypotheses fall to the wayside just because no academic will take responsibility for them, then why does academia qualify as a non-profit and receive public funding? That’s like publicly funding a medical system which leaves some patients indefinitely searching for a doctor willing to see him/her.

I wrote to many scientists and philosophers about my paper. Only the most famous responded. Perhaps it is easier to respond when you have the handy excuse that you are too busy to offer more than encouragement. In the end, it fell to me to test my own hypothesis.

Although it is common to speak about types of people, it turns out to be challenging to develop rigorous measures of such types. That’s why most modern measures of personality refer to dimensions instead of types. Measures which do refer to types, such as the Meyers Briggs Type Indicator (MBTI), are criticized for failing to produce bimodal distributions. They fail the test of rigor. I developed an instrument I called the “GRINSQ” to measure the types which my theory predicted, and it did produce bimodal distributions.

In addition to that achievement, the GRINSQ showed significant statistical relationships to personality, moral attitudes, religious behavior, political ideology, vocation, gender, identification with team sports, identification with romance, and likelihood to be accused of a crime or other serious betrayal of trust. The professor who challenged me to develop the instrument conducted further research with the GRINSQ and found significant correlations to psychopathy and Machiavellianism but not narcissism. In short, the GRINSQ opened a world of new evidence that demanded explanation.

As is customary in the sciences, I wrote up the results of the GRINSQ research and submitted them for publication in a peer-reviewed journal. My theory had already been published, so there was no reason to think the theory doesn’t deserve attention. I expected to have an easier time publishing the GRINSQ research because a theory is even more noteworthy when there is evidence to support it. In addition, I was offering an instrument that others could use to gather more evidence — that seemed especially valuable. But publication is perverse. There is less shame in publishing an untested hypothesis which turns-out to be false than in publishing evidence which turns-out to be misleading. And the worst shame is to waste the time of a whole bunch of scientists by convincing them to use an instrument which never becomes standard. The more value a study can potentially offer, the more conservatively it is rejected.

Like trying to publish a work of fiction, I received rejection after rejection. Over the course of two years, my paper was submitted to ten different peer-review processes. Five rejected on the basis of scope even before seeking reviewers. Another four gave me as much feedback as they could, but reported inability to find enough volunteers to review it. Of the total six blind reviews returned to me from all processes, none recommended any changes to the testing procedure, yet none endorsed publication. It was as if the goal of peer-review were not to ensure the rigor of testing, but rather to decide via opinion whether the hypothesis deserves to be tested in the first place.

There are three strategic times for peer-review:

  1. Peer-review before an experiment is conducted (i.e. registered report),
  2. Peer-review as a gateway to publication (thus permitting each journal to establish a specific scope, typically aligned with a research legacy), and
  3. Peer-review after publication (i.e. post-publication, which protects the public from the publication of flawed research).

Just like I honor the right of professors to prioritize PhD candidates who will perpetuate their legacies (which really are valuable!), I honor their right to establish journals with scopes that help them advance those legacies. I recognize the value of the second form of peer-review. However, if PhD programs and journals exist merely to perpetuate legacies, then people had better not be required to get a PhD or to be published in a peer-reviewed journal to be taken seriously.

The final journal to which I submitted was PLOS ONE. It is not a disciplinary journal; each month it publishes about 1834 papers which span all of science, so no one wants to read it all. The only value peer-review can provide such a journal would be to protect the public from the publication of flawed research. In fact, PLOS ONE instructs its reviewers to allow the online community to judge the noteworthiness of submissions post-publication. PLOS ONE reviewers are supposed to reject submissions only if they violate of the rules of science.

I like the idea of PLOS ONE, but my experience was that their reviewers judge on the basis of noteworthiness, first by choosing which papers they volunteer to review and second by rejecting papers without identifying any required changes to the testing procedure. “Look at this data that demands explanation!” I say, and they reply, “Until I see an explanation I like, I will have no part in your data.”

In retrospect, I get it. I’m anti-social. My use of experimentation is aimed to promote scientific revolution that would dethrone established scientists. Things would go more smoothly for me if I could frame my work in ways that seem to support someone. For PLOS ONE to achieve its vision in spite of people like me, it would need volunteer reviewers who prioritize truth over their own personal interests. Such reviewers might have to sacrifice days of their own research to learn what would be necessary to confidently assert that a given submission is rigorous. We can’t blame PLOS ONE for failing to find enough such volunteers. It’s the process that is to blame.

PLOS ONE uses the wrong kind of peer-review for its purposes. When the goal is merely to protect the public from flawed research, the best kind of peer-review is post-publication. A study could contain flaws which few people in the whole world are qualified to detect at the time a study is conducted. It might even take years before anyone is qualified to recognize the flaw. Post-publication is the only form of peer-review which can protect the public from such flaws.

So I have self-published the GRINSQ research on figshare along with the reviews from PLOS ONE and a letter encouraging people to share all new criticisms through the post-publication peer-review website PubPeer. In theory, that’s the appropriate result of the process, but to restore my faith in Science, I’d have to see that results published in this way get treated as legitimate. I have to see that other scientists similarly self-publish their results when journals recommend no changes to the procedures that produced them, and that news media report on such results as they would report on results published in journals.

One measure of this general culture change would be a change in Wikipedia’s policy that self-published sources (even of replicable experiments!) do not qualify as verifiable. The vast majority of experiments published in top peer-reviewed journals have never been independently replicated, and even those that have been replicated will become reinterpreted in future scientific revolutions, so Wikipedia’s potential to distill “knowledge” is laughable. The best anyone can offer is the current state of science, and, as Rudolf Carnap famously pointed out, that requires accounting for all evidence, a significant portion of which will predictably have been rejected by journals.

In the end, my faith in Science is less about the scientific method, or even about scientists, than about the people of my general community. I can apply the scientific method to find truth, but what good will that truth be if I cannot find other people who will accept it? If the scientific method gives me truth that isolates me and dies with me, then I should not use the scientific method. A personal sacrifice that benefits others could be worthwhile, but a sacrifice that benefits no one deserves no faith.