# Biostatistics : **Introduction, Scope And Philosophical Framework**

**Introduction**

Biostatistics, a portmanteau word constructed from biology and statistics, is defined as per the etymology; application of statistics in biology. Historically the field of statistics was emerged and systematically developed to answer various problems in biology, especially on morphometry (measurement of morphological traits) and population genetics. It was only later that the field started having applications in various other disciplines, notably in quantitative fields of humanities (psychology and economics) such that the original meaning of statistics got steadily expanded necessitating the coinage of biostatistics to refer biological statistics. The term statistics now acquired a new meaning, “branch of mathematics that deals with the experimental design, the collection of numerical data, summarization of the data, analysis and interpretation of the data for drawing inferences on the basis of the probability.”

British biologist and population geneticist, Sir Ronald Aylmer Fisher, is usually considered as the father of statistics because of a number of seminal contributions that he made to the discipline (for example, F-distribution and ANOVA). However, the claim is contested by many. Perhaps the development of the field began as early as 17th Century when British philosopher Francis Bacon published *Novum Organum* in 1620 that detailed fundamentals of inductive reasoning, an attribute of statistics as we will discuss later in this module. Other key scientists behind the development of statistics include evolutionary biologist Karl Pearson, geneticist Sewall G. Wright, population geneticist JBS Haldane, geneticist Charles Davenport, geneticist William Bateson, Botanist Wilhelm Johannsen and morphometricians Raphael Weldon and D’Arcy Thompson.

Morphometrician and anthropologist Prasanta Chandra Mahalanobis is considered as the father of statistics in India. He was one of the founding members of the erstwhile Planning Commission of India and founded Indian Statistical Institute, Kolkata. A long-term friend of British population geneticist JBS Haldane, Mahalanobis invited Haldane to work with him at ISI. Haldane worked from 1957 till 1961 at ISI and made several contributions to the development of statistics in India.

On the other hand, the term ‘mathematical biology’ is defined as an interdisciplinary field encompassing all applications of mathematics to the biology. Development of this field is concurrent with that of biostatistics.

**Scope**

The scope of biostatistics is extensive and cover almost the whole of biology that deals with generation and analysis of numerical data. Biostatistics is used right from designing scientific experiments through the data analysis. The scope includes principles of scientific methodology, defining various types of data and studies, levels of measurements, descriptive statistics, inferential statistics and hypothesis testing, and correlation. The field also includes various predictive methods and curve/model-fitting including regression analysis, maximum likelihood, Bayesian Inference and Principal Component Analysis.

The scope of mathematical biology is equally vast to include various applications of mathematics in biology. Beside biostatistics, this field also encompasses applications of other mathematical disciplines including probability, number theory, game theory, set theory, neural networks, mathematical modelling, use of calculus in biology, fractals and Fibonacci series, and so on. Mathematics is used very often in population genetics, environmental biology, ecology, psychology, evolutionary analysis, enzyme kinetics and so on.

**Philosophical framework: Logical reasoning and the connection between probability and statistics**

The field of logical reasoning is a branch of philosophy that is involved with making valid conclusions from a set of premises (preconditions) through systematic thinking, logical arguments, cognition and intellect. For example, consider syllogism:

- All vertebrates have spine (Premise No. 1)
- Snake is a vertebrate (Premise No. 2)

Therefore, snakes have spine (Conclusion)

The syllogism is an example of deductive reasoning (deductive logic)-the process of reasoning from premises with one of them being a general theory/rule) to reach logically valid conclusion about a specific case. Deductive reasoning deduces (derive or infer or determine) the truth/validity of specific conclusion/observation from general theories or axioms or rules, so it is a top-down approach (general to specific). Deductive reasoning is involved in formal logic (mathematical and philosophical logic). In deductive reasoning, conclusions are certain (inevitable), therefore, outcomes are either true/valid or false/invalid. Most of the theories in science and mathematics follow the deductive reasoning. The validity of scientific hypotheses can be tested using deductive reasoning, the so-called scientific method. The field of statistics is mostly involved with deductive reasoning, as it tests the validity of *a priori* hypothesis (that is related with chosen significance level and null hypothesis defined as part of the experimental design) using specific observations. *A priori* means from the former, *a priori* knowledge is a knowledge that comes from the power of reasoning based on self-evident truths (rules/theories/axioms) that are independent of experiences. For example, “all invertebrates have no vertebrae” (these types of arguments are called ‘tautology’ in philosophy) is defined such a way that the truth is self-evident independent of experiences (it is impossible to have an animal with vertebrae to be called ‘an animal without vertebrae’ = invertebrate). Deductive reasoning is also used in fields such as chemical nomenclature, taxonomy, judiciary etc. The validity of a new species description (specific case) is tested with codes of taxonomic nomenclature (general-theory) to accept or reject the validity of the proposal. A defendant is declared guilty or not guilty based on evidence and various judicial codes (Indian Penal Code, Constitution of India and so on).

In contrast, inductive reasoning takes the route from specific observations to a general conclusion (bottom-up) and is involved with everyday arguments and formulation of scientific theories (hypothesizing). For example, consider the following argument:

- This swan is white (Premise No. 1)
- That swan is white (Premise No. 2)

Therefore, all swans are white (Conclusion)

Inductive reasoning, though tempting and persuasive, is not deductively valid. For example, in the above argument, if a single black swan is discovered, the conclusion would no longer be valid. Therefore, inductive arguments are not grouped as valid or invalid, but strong (cogent) or weak (uncogent or fallacious). The above example is logically an incorrect argument; a correct conclusion would have been “we expect that all swans are white”. How confident are we in this conclusion? It depends on a number of swans we observed, and proportion of white swans in it; it can be expressed mathematically as a probability. Probability follows the inductive reasoning and is an example of *a posteriori *(from the latter) knowledge-knowledge based on experience or empirical evidence. The validity of inductive arguments, therefore, can be expressed in terms of probability; for example, a strong argument can have very high probability values, while weak arguments can have low probability values. As we will later learn, Bayesian Posterior Probability is an example of inductive reasoning, as it depends on *a posteriori* knowledge (observations or expectations or evidence).

Logical induction suffers from a number of biases, including cognitive biases, availability heuristics and logical fallacies. The field of critical thinking exposes these biases such that a thinker/investigator is aware of them. Critical thinking makes the researcher be prudent not to make these biases distort and influence the conclusion.

**Summary**

- The field of statistics was originally developed for testing biological hypotheses, and later expanded to other disciplines. The term Biostatistics is currently used to refer application of statistics in biology.
- Mathematical biology is defined as an interdisciplinary field encompassing all applications of mathematics to the biology.
- Two pioneers behind the development of statistics were both biologists; Ronald Fisher and Karl Pearson. Anthropologist Prasanta Chandra Mahalanobis is considered as the father of statistics in India.
- In philosophical reasoning there are two types of logic, deduction and induction. While deduction follows top-down approach from general to specific as followed in statistics, Induction follows bottom-up approach from specific to general as followed in probability.
- Logical inductive reasoning is known to be influenced by a number of logical fallacies, cognitive biases and mental heuristics.

**References**

**1.** Ghosh, J. K., Maiti, P., Rao, T. J., & Sinha, B. K. (1999). Evolution of statistics in India. *International statistical review*, *67*(1), 13–34.

**2.** Sokal, R. R., & Rohlf, F. J. (1987). Introduction to biostatistics. *New York*.

**3.** Motulsky, H. (2014). *Intuitive biostatistics: a nonmathematical guide to statistical thinking*. Oxford University Press, USA.

**4.** What is Biostatistics? Accessible at https://www.biostat.washington.edu/about/biostatististics

Inductive vs. deductive reasoning video https://www.youtube.com/watch?v=VXW5mLE5Y2g

**Say Hi!**

**Linkedin** : *www.linkedin.com/in/RiteshpratapS*