What would happen if scientists followed the scientific method?
One of the great things about the scientific method is its simplicity. Quantum mechanics or advanced calculus may have to wait until specialized university courses, but the scientific method can be summarised in a flow diagram and taught to school children.
Most science textbooks, in the section considering method, provide a title like “The Scientific Method.” This can give students the impression that there is such a thing as the scientific method, which is unfortunate. Any students who do not consult different textbooks may never realise that different textbooks outline different methods. At best, any given textbook shows a scientific method. In our essay on Enlightenment Science there was much crowing about the singularity of the scientific method. The scientific method was supposed to be the guarantor of universality and objectivity. If different people, in different times, or different places, or asking different questions, have different scientific methods — well then the entire coherent, consistent, rock-certain edifice of science seems to have a question mark over it.
Let us not panic. Maybe the different scientific methods are merely variations on a theme; different ways of phrasing the same basic ideas. Maybe one method mentions “hypotheses” and “observations”, and another doesn’t, but they are just using different words to describe the same thing. Maybe. Let us test that hypothesis by considering two highly influential formulations of scientific methodology: one due to Francis Bacon and one due to Karl Popper.
The Baconian Scientific Method
Francis Bacon (1561–1626) worked during Europe’s Early Modern period, and is often credited with helping lay the foundations of Modern science which then rose to prominence in the Enlightenment. Following on from (and in rebellion against) the deductive processes of Aristotle (384–322 BC), Bacon placed the generation of scientific knowledge on an inductive foundation. Put very simply, the Baconian Scientific Method can be summarised as follows:
Make observations and, guided by this, come up with a theory. Observational evidence serves the purpose of potentially verifying the theory.
The intention is simple: make observations. As many, and as various, and as unconstrained by the blinkers of theory as one can. Then, very carefully, cautiously, put forward a theory. By this method — of abundant observation and careful theorising — one has a way of showing which theories are true.
One can see this method at work across science. In astronomy, for example, scientists make observations of planets, stars, galaxies, comets, and so forth. They then devise theories to account for their observations: the paths of planetary motions; the nuclear processes powering stars; the forces holding galaxies together; the origins of comets, and so forth.
Charles Darwin, in developing his theory of evolution, assiduously and consciously followed the Baconian method. He claimed of his own efforts: “I worked on true Baconian principles, and without any theory, collected facts on a wholesale scale.” 
While the Baconian scientific method was hailed as ushering in an age of logic and reason, and while we see scientists such as Darwin conscientiously following it, it suffers from a significant flaw: it is based on a logical fallacy.
Consider two statements: a universal one (such as “All students are lazy”) and a particular one (such as “This student is lazy”). Is it logically possible to infer one from the other? Of course it is! If all students are lazy, and if Alice is a student, then Alice is lazy. This is a logical syllogism, and such deductive methods, moving from the universal to the particular, have been known since at least the time of Aristotle.
But can we go the other way? From particular to universal? If Alice is lazy, and Alice is a student, can we infer that all students are lazy? Of course we cannot! This is a logical fallacy, and such inductive errors, moving from the particular to the universal, have been known to be errors since at least the time of Aristotle.
Here, however, is the rub: we want to go from particular to universal. We want to generalise from the specific to the general. Observing weather patterns year after year, I can reach the end of time and declare that March was always the best time to plant crops. But only being able to say this at the end of time is rather too late! I want to make a specific observation (like, that March would have been a good time to have planted crops last year) to a general one (like, that March will be a good time to plant crops every year). We face a choice: do we go for a method that is logically sound, or a method that is useful? The great advance which Bacon made, to usher in the age of science, was to abandon logical rigour.
The Popperian Scientific Method
Karl Popper (1902–1994) worked in the early 20th century, and is often credited with helping bring about the downfall of positivism. Following on from (and in rebellion against) the inductive processes of Bacon, Popper placed the generation of scientific knowledge on a deductive foundation. Put very simply, the Popperian Scientific Method can be summarised as follows:
Come up with a theory and, guided by this, make observations. Observational evidence serves the purpose of potentially falsifying the theory.
The intention is simple: propose theories. As many, and as various, and as unconstrained by the blinkers of observation as one can. Then, very carefully, cautiously, make measurements. By this method — of abundant theorising and careful observation — one has a way of showing which theories are false.
We started this essay with the hope that apparently different scientific methods might be using different words to describe the same thing. In fact, between Bacon and Popper, they are using the same words to describe different things. And not just slightly different things. Not refinements of a constant underlying idea. They are describing very different things. Their methods are opposites. This is not coincidence. Popper knew that this is how his formulation of science had to look. Justus von Liebig, writing seventy years before Popper published his work, insisted “The real method of natural science… is diametrically the opposite of Bacon’s method.” 
Popper was unwilling to claim for science a rigour it did not posses. He would not countenance the centuries-old scientific acceptance of logical fallacies at the heart of the Baconian scientific method. He therefore returned to the deductive arguments that Bacon had rejected. But a price had to be paid. Logic had not somehow bent to human desires in the intervening time, so that it would give us all we wanted. If Popper wanted logical rigour, he would either have to give up the usefulness of science, or he would have to give up something else. Unwilling to abandon usefulness, he gave up on certainty.
Popper was attempting to square the following circle:
1) Science should make universal statements.
2) Science should be based on observational evidence.
3) We can work from universal statements to particular statements (“All students are lazy,” therefore “this student is lazy”) but not from particular to universal.
4) We can only have observational evidence for particular statements. (I have observed this student. But I have not, and can never, observe all students.)
His solution was to recognise that the negation of a universal statement is a particular statement. (“It is not true that all students are lazy” means “there is at least one student who is not lazy.”) It is in principle possible to have observational evidence to support the particular statement, “There is at least one student who is not lazy.” So it is possible to have observational evidence to deny the universal statement “All students are lazy.” We can never have enough evidence to affirm a universal statement, but we can have enough evidence to deny it.
Certainty in the truth of any given theory was now lost: we cannot say “My theory is right.” But we can make statements which are logically justified by the evidence: “I have not yet shown that my theory is wrong.” And this, pragmatically speaking, looks like it allows us to make statements that seem to be useful. So Popper was happy.
What does this mean for scientific theories, though? If Darwin developed the theory of evolution using Bacon’s scientific method, it is not scientific under Popper. If he inadvertently developed it in accordance with Popper’s scientific method, he was a bad scientist by the measure of his own day. If the scientific method changes over time, how do we know it will not change again? Having rejected Bacon, will we ultimately reject Popper too? Can a theory be considered ‘good science’ by Popper, but ultimately be rejected as bad science? Or can a theory be ‘bad science’ by Popper’s standards, and still be embraced (by some people, at some times, in some places, regarding some questions) as good science? To answer these questions, let us look more closely at how research into evolution works.
Popper and evolution
Popper’s scientific method can be summarised in a flow diagram, and we can analyse evolutionary research by simply seeing how well it follows the steps. Here we go.
Step 1: Develop a theory.
Darwin did this. Evolution is a theory. Tick.
Step 2: Make a prediction.
Based on his theory, Darwin made numerous predictions. For example, he predicted that the fossil record would show a diversification of species: starting from a few forms in the past, ending with numerous distinct forms today. He sketched this branching “tree of life” in his notebooks, and reproduced it in Origin of the Species.
Step 3: Collect data.
This is slow, painstaking work, but it can be done. Robert Carroll did it . He went through the available data for the fossil record and counted up, for any given time period, the number of families that existed in each phylum . He came up with the results shown here.
Step 4: Decision: Does the data match the prediction?
Well, there are different ways to slice this.
Darwin’s ‘tree of life’ prediction — which is admittedly schematic — started with five species at the bottom and ended up with eight species at the top. Carroll’s results started in the Vendian period with three phyla, and ended today with ten. That seems like it may fit, at least qualitatively, with Darwin’s prediction.
That said, it does seem a little strange that there were already ten phyla at the end of the Cambrian and yet, 490 million years later, there are still only ten phyla today. Worse still, Darwin’s new species at the top had branched off from what was originally there at the bottom. The new phyla in Carroll’s diagram just turn up from no-where. So maybe it doesn’t fit the prediction.
Looking at the pattern within phyla, molluscs clearly understood the plan. They started off with a few families, and they kept branching out. Chordates (of which you are one) and arthropods (insects, spiders, and other creepy crawlies) were on pretty good form. With occasional setbacks, they basically did what Darwin expected.
By contrast, porifera (sponges) started off well, but then do less well. And then they do better. And then they do worse. Platyhelminthes (flatworms) just didn’t get the memo. In half a billion years, they did nothing. Still, they did better than brachiopods, which held out until the Devonian, 360 million years ago, but then gave diversification up as a bad job.
Taken overall, it is hard to state conclusively that the data match the prediction. Carroll puts it this way: “Patterns and rates of evolution are much more varied than had been conceived by Darwin or the evolutionary synthesis.” That is a polite way of saying, “The data doesn’t match the predictions.” Claims like that might be expected from a Creationist’s religious tract, but this is published in Trends in Ecology and Evolution. This is mainstream evolutionary-science literature. The data does not fit the predictions.
Step 5: Action: Reject the theory.
We conclude — objectively, rationally, demonstrably, scientifically — that evolution is not true. Popper assures us that it is good to be highly creative in proposing theories, but we must also be ruthless in rejecting them once falsified. Popper tells us that there is nothing wrong with proposing a theory that is later falsified. Darwin can stand tall as a good scientist. But there is everything wrong with holding on to a theory after it has been falsified. The burden is therefore on us. We have no other option. If evolutionary research is to follow Popper’s scientific method, then we must conclude that evolution is wrong. The only way to claim that evolution is true — or even that it might be true — is to violate this scientific method. This is evolution’s Catch 22: we can hold on to the scientific method and reject evolution, or we can hold on to evolution by rejecting the scientific method.
What route does Carroll take? He states, “New concepts and information… need to be integrated into an expanded evolutionary synthesis.” An expanded evolutionary synthesis! Why throw evolution out, when you can expand it? In a straight choice between evolution and Karl Popper’s scientific method, Karl Popper lost.
When someone says that evolution is not scientific, this is not necessarily because that person has somehow failed to grasp the scientific method. They may have listened very carefully to their science class at school. They may have grasped the scientific method they were taught very well. And the scientific method they were taught says that you must either reject evolution as untrue, or accept that evolution is unscientific. From where we have got in this discussion of Popper so far, the puzzle hit is not “How can people who claim to be scientifically literate reject evolution?” but rather “How can people who claim to be scientifically literate accept evolution?”
The solution to the puzzle is relatively simple. Consider the logic of the situation thus far:
if scientific research follows Popper’s scientific method,
and research on evolution theory does not follow this scientific method,
then evolution theory is not scientific.
This is the position taken by many Creationists, and its logic is perfectly valid. What is to be done by those who desperately want evolution to be scientific, but who have the data starring them in the face that esteemed evolutionary scientists, publishing in esteemed evolutionary journals, do not reject the theory when the data does not match the theory’s prediction? Simple: let us follow the logic where it leads:
if evolution theory is scientific,
and research on evolution theory does not follow Popper’s scientific method,
then scientific research does not have to follow Popper’s scientific method.
This logic is also perfectly valid. Provided you are willing to toss Popper aside, you can keep evolution. We already tossed Bacon aside. We already knew that the methods of science are not stable over time. Why should we get cold feet now and balk at doing it again? Admittedly, there will be a price to pay. Popper lost certainty as payment for keeping logical rigour. What will we lose now as payment for keeping evolution? And is it a price we are willing to pay?
Evolution is complicated and controversial. Still, the predicament we face with attempting to hold onto theories after they have been apparently falsified is quite general within science. We can illustrate this from a trivial example in physics.
Consider a mass, initially at rest, sliding down a slope at an angle of 45 degrees to the horizontal, under a uniform gravitational acceleration of 10m/s^2. If Newtonian mechanics is correct, it will take 0.532s to go 1m. (If you want to know why, there is a figure below. If you don’t care why, read on.) This statement is logically identical to the statement, if such a mass does not take 0.532s to go 1m, Newtonian mechanics is not correct.
That claim is testable! Not only is it testable; in preparing this essay, I tested it. I set up a 1m long slope at 45 degrees to the horizontal and I rolled a toy car down it. It took 0.90s.
So let us look at this in the light of the scientific method: Newton developed a theory. I used that theory to make a prediction. I collected data. I compared the data to the prediction, and they didn’t match.
so… [drum roll] …
I have no intention of throwing out the theory on the basis of my data.
Does that make me a bad scientist?
Not at all! Because, when we said “You will measure a time of 0.532s if Newtonian mechanics is correct,” this was short-hand for what we really meant. Namely —
You will measure a time of 0.532s
if Newtonian mechanics is correct,
and the timer works,
and the experimenter is honest,
and you got all the parameters right,
and the track is straight,
and friction is negligible,
and rotational inertia is negligible,
and there is nothing else you missed.
In the event of the experimental data not being in agreement with the theoretical prediction, we must conclude the following:
Given you did not measure a time of 0.532s,
either Newtonian mechanics is not correct,
or the timer doesn’t work,
or the experimenter is not honest,
or you didn’t get all the parameters right,
or the track is not straight,
or friction is not negligible,
or rotational inertia is not negligible,
or there was something else you missed.
The experiment tells you that there is a problem with (at least) one of these lines. But it does not tell you which one(s). Some are relatively easy to check: you can make sure the track is straight, and that the distance really is 1m. Others cannot be checked, even in principle: how can you be sure that there is nothing else you missed? For all their prowess with logic and mathematics the ancient Greeks eschewed experimental science because the set of “other things you might have missed” is infinitely large. As such, they gave up experimentation as a forlorn hope before they started. We may not have to follow the Greeks in abandoning empirical science all together, but we do need to recognise that falsification is not as simple as it initially seemed.
The example of a single bad experiment falsifying the entirety of Newtonian mechanics is admittedly trivial. Popper got round such simple rejections of his method by fiat: such putative falsifying instances can be ignored because one-off events don’t count. On the flip side, Popper listed several things that are not acceptable reasons for ignoring apparent counter-examples. These include claiming that key terms are poorly defined, that the apparatus has some unknown systematic error, or that the experimenter is incompetent or dishonest (unless the experimenter is making claims about the occult, in which case they are probably both incompetent and dishonest). Popper granted that his list of what counts as valid and invalid refutations was incomplete, and he left it to the investigator to guard against the temptation to cheat. Despite the list of acceptable practices being thereby left to the integrity of the individual scientists, he insists that such rules are necessary, otherwise it is “not possible to divide systems of theories into falsifiable and non-falsifiable ones; or rather, such a distinction will be ambiguous. As a consequence, our criterion of falsifiability must turn out to be useless as a criterion of demarcation.”
That bears repeating: Popper himself recognised that if he did not have some way of working out which statements were reliable and which were not, then falsifiability would be useless.
Popper’s nightmare comes true
When philosophers of science at the start of the 20th Century were constructing their ideas of what science should be like, they imagined that we hold a body of facts that we know to be true, and have one additional idea that must be tested. For example, if we wish to test the statement “all swans are white,” they assumed that we know the truth (or falsity) of all other statements. This additional body of knowledge includes statements such as, “my eyes work reliably,” “I know a swan when I see it,” “I know what’s not a swan when I see it,” “I can tell when someone took a tin of black paint and tried to play a trick on me,” and “there are no other confounding factors that I failed to take into account.”
But that is not how life works. Ideas always come bundled together. This is not just true of masses on inclined planes; it is true of everything. Ideas always come bundled together. An experimental result can tell us that something is wrong. It tells us that we are wrong about at least one of the statements in the bundle. But it does not tell us which one. Maybe it really is a black swan. Maybe it is black, but it is not actually a swan. Maybe it is white, but I am hallucinating and think it is black. Attempting to fact-check the background assumptions does not solve the problem, it only moves the problem along. No statement within the background assumptions comes on its own, but is always bundled with other statements. Taking more data does not move us closer to pinning down the background assumptions, because the number of assumptions increases with each data point taken: my eyes worked reliably when I first saw the swan; they still worked reliably the second time I looked…
Given the difficulty of establishing which statement (or statements) are in error, we may try to bring the theory in line with the experiment by changing one (or more) statement(s) within the overall theoretical framework. When testing Newtonian mechanics, you may notice that the angle used in the experiment had been too shallow, and calculating the predicted (now post-dicted) result with the actual angle used brings experiment and theory into agreement. But you cannot stop there, just because there is now agreement. Does the agreement arise because you made no errors, or because you made multiple errors? Maybe the local acceleration due to gravity is actually a little lower, but the stopwatch was also running a little slow. Maybe Newton was wrong, but the effect was perfectly cancelled out by the air resistance you neglected. Maybe Newton was right, but only when someone is looking.
Unfortunately, no amount of checking will guarantee that you have found all the mistakes. This is true if you claim to have shown that the theory matches the experiment. It is also true if you claim to have shown that the theory has falsified the experiment. You can never definitively verify a theory. You can also never definitively falsify it. There is always the possibility that you made a mistake somewhere. Any claim that a theory should be accepted or rejected comes down to a judgment call: you can spend more time checking for mistakes, but is it worth it? One person may decide to give up on the central premise of the theory and move on. Another person may decide to keep looking for mistakes around the edges that would rescue the central premise. Importantly, there is no objective measure by which to say that one choice is right and the other is wrong.
This point bears repeating: there is no objective way to say when a scientific theory should be rejected. Science cannot be objective in selecting which theories to accept and which theories to reject.
With each step, our Enlightenment Vision unravels. Popper attempted to rescue the scientific method but gave up certainty. Our story has not yet reached the point of saying what scientific method, if any, will replace the Popperian one. But we do know that it will have a different relationship with demonstrability, and will not be wholly objective. Maybe, though, we are over the worst of it. Maybe the remainder of the Enlightenment Vision can still be salvaged.
Philosopher of science Imre Lakatos suggests that any optimism in this regard may be misplaced: “Few philosophers or scientists still think that scientific knowledge is, or can be, proven knowledge. But few realize that with this the whole classical structure of intellectual values falls in ruins and has to be replaced.”
With those words ringing in our ears, our next essay will look more carefully at the implications of all of this for objectivity in science.
 Charles Darwin (2013). Life and Letters of Charles Darwin, Vol. I. Francis Darwin (Ed.). Gutenberg Book Project. Originally published 1887.
 Justus von Liebig (1863). Über Francis Bacon von Verulam und die Methode der Naturforschung (Munich) p. 48. Quoted (in English translation) by Florian Cajori, “The Baconian Method of Scientific Research,” The Scientific Monthly, 1925, Vol. 20(1), pp. 85–91.
 Karl Popper (2002). The Logic of Scientific Discovery. Abingdon: Routledge Classics. Originally published 1935.
 Robert L. Carroll (2000). “Towards a new evolutionary synthesis.” Trends in Ecology and Evolution 15, 27.
 In biology, things are classified under their kingdom, phylum, class, order, genus, and species. Within the animal kingdom, for example there is a phylum called the chordates. Mammals form a class within this, primate form an order within that. One family within this is the hominids, or great apes. Orangutans form one genus within this, consisting of three species: the Borneo-, Sumatran-, and Tapanuli Orangutans. Given this system of classification, the question, “How does the total number of species vary over time?” is at least similar to the question, “How does the number of families within a phylum vary over time?”
 Karl Popper (2002). Op. cit. pp. 60–61.
 Imre Lakatos (1980). The Methodology of Scientific Research Programmes: Philosophical Papers Volume 1. Cambridge: Cambridge University Press. p. 8. Lakatos was very much aware of how his claims about certainty related to falsifiability, and how this raised questions for science-and-religion discourse. He made such connections front and centre: The First chapter of this book is called “Falsification and the Methodology of Scientific Research Programs.” The first section in that chapter is called “Science: Reason or Religion.” The quote given here is from the first paragraph of the first section of the first chapter.