Photo by Isaac Smith on Unsplash

Data Summarization

The term Data Summarization refers to presenting the summary of generated data in an easily comprehensible and informative manner. Presenting the raw data (the data that was generated which is essentially the entire repertoire of datasets- individual measurements) is not practical in many cases.

For example, an epidemiological study that involved blood glucose measurements from lakh samples, or human genome (the entire human genome if printed, would occupy 130 volumes and take 95 years to read). Presenting such complex data would need several printed pages, and convey no easily comprehensible information. For example, what are the general trends? Is the…

Photo by Meagan Carsience on Unsplash

Permissible scale transforms

As my earlier article — reads, measurements do not carry meaningful information if the level is low; the numbers, even if with high precision, are not accurate. Each level of measurement has a permissible transform, which is the allowed set of mathematical operations that preserve the level of measurement. It is impossible for measurement at lower levels to transform to another level by any set of mathematical operations; the central dogma of statistics, GIGO (Garbage In Garbage Out) is prevalent everywhere. Permissible transforms are hierarchical. …

Photo by William Warby on Unsplash

Measurement is the process of assigning numbers to quantities (variables). The process is so familiar that perhaps we often overlook its fundamental characteristics. A single measure of some attribute (for example, weight) of sample is called statistic. These attributes have inherent properties too that are similar to numbers that we assign to them during measurement. When we assign numbers to attributes (i.e., during measurement), we can do so poorly, in which case the properties of the numbers to not correspond to the properties of the attributes. In such a case, we achieve only a “low level of measurement” (in other…

Photo by Paolo Nicolello on Unsplash

Variable is a quantity that may vary from object to object. For example, we measure heights of 50 mango trees in a selected plot and arrange the results in a table. Here, the quantity that vary between objects (trees) is its heights. Height, therefore, is the only variable in this example. The table containing collection of values of our variable is called ‘dataset’ or sample.

Independent vs dependent variables

Let us consider an example. Algal net primary productivity (mass of carbon per unit area per year (g C (m^-2) (yr^-1)) is measured under various temperatures and light intensity settings. In this experiment, there are…

Photo by Luke Chesser on Unsplash

Studies can also be grouped based on the kind of data analysis that is performed after the collection of data. Data analysis can be defined as (by John Tukey, 1961) “Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.”

In a sense, this classification is based on the purpose of collecting the data and conducting the study; is it to confirm an a priori hypothesis? …

Photo by chuttersnap on Unsplash

Experimental Studies

In scientific experiments, investigators deliberately set one or more factors to a specific level. The word ‘experiment’ has a different meaning in statistics and science from that in the everyday life where the word is used to mean “try something to see what happens”. In science, an experiment is carefully controlled in such a way to test a priori scientific hypotheses, as explained in the first module. As rigorous inductive logic-based statistical hypothesis testing is involved, experimental studies always lead to stronger scientific inferences than do observational studies. In contrast to observational studies, experimental studies involve deliberate intervention by the…

Photo by Rémy Penet on Unsplash

There are broadly two types of studies in the statistics, classified according to how data is collected. Observational studies and Experimental studies. Conversely, there are two types of studies according to how data is analysed; Exploratory Data Analysis and Confirmatory Data Analysis.

Observational or Exploratory studies

An observational study merely ‘observes’ and collect data from an existing situation without any interventions or manipulations. Most of the curiosity-driven basic scientific research (also called ‘blue-skies research’) involves this kind of studies. For example, a taxonomist exploring Antarctic vista to survey and collect ice and snow algal samples, or an astronomer observing the night sky to study…

Photo by Erol Ahmed on Unsplash

Scientific methodology

Scientific methodology involves logical, rational, systematic, reproducible, empirical, and evidence-based methodology to test a hypothesis in question. Informal scientific methodology evolved in ancient India and Greece. Indian system of linguistics and grammar sensu Panini was indeed systematic inquiry of the structure of languages. Scientific methodology was developed in more formal sense by the seminal publication of Novum Organum by British philosopher Francis Bacon in 1620 and involves inductive reasoning from observations and/or experiments. Baconian method involves observation without preconceived notions; this would alleviate confirmation bias to a large extent. A hallmark of scientific methodology as conceptualized by the British philosopher…

Photo by Juan Rumimpunu on Unsplash

Critical thinking

The field of critical thinking deals with the objective analysis of facts to form a judgement. A major aspect of critical thinking deals with the identification of various cognitive biases and logical fallacies that skew our judgement. For example, our emotions are well known to affect our judgements; it is important to be rational while analysing the data. While critical thinking is required in every aspect of life, it is especially important in statistical inference.

A cognitive bias is erroneous evaluation, reasoning, remembering, or other cognitive process, often occurring as a result of holding onto one’s preconceived notions, emotions, preferences…

Photo by peter bucks on Unsplash


Biostatistics, a portmanteau word constructed from biology and statistics, is defined as per the etymology; application of statistics in biology. Historically the field of statistics was emerged and systematically developed to answer various problems in biology, especially on morphometry (measurement of morphological traits) and population genetics. It was only later that the field started having applications in various other disciplines, notably in quantitative fields of humanities (psychology and economics) such that the original meaning of statistics got steadily expanded necessitating the coinage of biostatistics to refer biological statistics. …

Riteshpratap A. Singh

Data scientist — R&D | AI Researcher| Bioinformatician | Geneticist | Yoga practitioner | Writer | Psy-Maths guy devoted his life to make machines intelligent

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store