Machine Learning Is Changing The Definition Of ‘Science’

Ayush Jain
4 min readJul 30, 2022

--

The machine learning applications in scientific research are facing a ‘reproducibility crisis.’ There is a chance that the very definition of ‘science’ is changing.

The pursuit of relentless research and innovation has led to unbounded scientific advances. In different disciplines, AI and Machine Learning (ML) models are employed to automate the process of scientific research and discovery. In particular, the opportunities that machine learning provides in medical science research are immense.

Image by Totojang from Pixabay

Take, for example, the recent McKinney et al. (2020) study, which demonstrated AI’s potential in medical imaging. They developed an AI system that served the purpose of breast cancer screening. The results were promising. The research performed hints at the superior abilities of AI in cancer detection. It is summarised best by a Nature article:

The authors assert that their system improves the speed and robustness of breast cancer screening, generalizes to populations beyond those used for training, and outperforms radiologists in specific settings.

Are we then in the golden age of medical science and research? I guess not.

There are issues with the studies above. For example, several issues were expressed in a joint paper written by over 20 researchers, such as:

(i) Absence of sufficient information on the code used to generate these results

(ii) Bias in the data fed into the systems (for example, if only a specific age group is taken, a particular gender is a majority, etc.)

The researchers claimed that these reasons make the machine learning process challenging to reproduce and verify. Therefore, rendering their study unscientific. This is known as the ‘reproducibility crisis.’

Thus, it would be impossible to verify these results, even by duplicating the experiment under the same conditions with the same data set. The code used to produce these results must be made available publicly to verify. In a machine-learning model, specifying the data, the conditions, and the code used is paramount to reproducing results and making the AI system reliable. The extent to which the reproducibility crisis exists today can be understood through the following data:

[Kapoor and Narayanan] analysed 20 reviews in 17 research fields and counted 329 research papers whose results could not be fully replicated because of problems in how machine learning was applied.

The pair, in their research, try to highlight the failure of machine learning in their application to the multiple sciences. They attribute the failure to something called ‘data leakage.’ Simply put, data leakage occurs “when information from the data set a model learns on includes data that it is later evaluated on.” It is similar to knowing the questions that will come in an exam beforehand. As a result, the system tends to give much better results.

This is why while the model will do well during the training and testing phase, it will fail when applied to real-world applications with a completely new dataset. Therefore, the reproducibility crisis we face today in the scientific discipline is due to the absence of any information on the raw data used during the research phase.

We often forget that machinic systems are not immune to error. The studies I have mentioned above are some exemplary cases. Here, the systems designed to predict the future from past data often have information about the future, either embedded within the code or the dataset.

This discussion is critical in the light that scientific research is already facing a reproducibility crisis even without the use of AI. A 2015 study showed that 39 of 100 published studies in psychology were not reproducible. These studies, when tested again, failed to show the same results as that described in the original research. If such is the case, it is fair to say that AI is only fuelling the already existing reproducibility crisis in scientific research.

There are implications to this. Perhaps the definition of what we understand as ‘scientific’ merit is changing.

Years ago, an Austrian-British philosopher, Karl Popper, devised a theory that said a hypothesis is scientific if and only if it is falsifiable. We can say that proposition X is scientifically valid if there is a way — through experimentation or calculation — by which it can be proven false. This method of verification is known as ‘falsification.’ This is in tandem with a particular methodology of science:

Begin with an observation, acquire background knowledge, formulate a hypothesis, design a procedure, conduct an experiment, analyze its data, and conclude whether the original hypothesis has been falsified based on this data or whether it is supported by its results.

The scientific process aims to provide a relational understanding of the signs and symbols to achieve an abstract conceptualisation of them. Compared to this, the machine learning model focuses on the results. It uses an induction method to determine how the data it is trained upon can be applied to new data to achieve the highest accuracy. Therefore, we are moving away from a measure of ‘science’ determined by falsification. AI systems cannot be measured by falsification. Here, the accuracy of the system determines its scientific credit.

--

--