Towards an Unbiased Development of New Bioinformatics Methods

Remy Lau
The Citadel
Published in
3 min readMay 12, 2021
Image by Gerd Altmann from Pixabay

Advancements in artificial intelligence and machine learning have led to many new methods in the areas of bioinformatics and computational biology. Common applications include gene classification [1], sample annotation [2], enzyme properties [3], and etc.

The problem of optimistic bias

Many papers that propose new computational methods conclude with a similar claim that “the proposed method outperforms the current state of the art”. A recent paper [4] published in Genome Biology discussed this issue. The authors pointed out such statements are likely to be biased. Particularly, it is often optimistically biased, i.e. intentionally or unintentionally oversold, in fear of not getting their paper published.

For example, during the development of a new method, if the evaluation shows that the performance of the new method is bad, one would actively search for a potential bug in the code. Meanwhile, if the performance of the method being compared against shows bad performance, it would generally be accepted.

Another major issue is being selective in datasets that favor the proposed method. Although it is common that a computational method is likely to favor some datasets with particular properties than others, it should still be presented and discussion and explanation should be made to better guide future directions and method design.

The authors [4] also devised a study using benchmarking results presented from 27 studies on HumanMethylation450K dataset. They considered the results of a method from its original publication being “non-neutral” and the results of that same method presented from another publication for comparison being “neutral”, as there’s no conflict of interest when the evaluation is done by a third party, who is not actively seeking to improve that specific method. The main findings are two-fold:

1) newer methods are on average superior to older;

2) noteworthy optimistic bias in favor of new methods in the papers introducing them

Moving forward

So what do we do now? Should we just accept this trend and live with the fact that newly introduced methods are oversold in some way? The authors in [4] suggested that journals and editors should be less critical about methods not showing “groundbreaking” discoveries or improvement over the existing methods. Meanwhile, authors should be more comfortable revealing the weaknesses of the proposed method and providing insightful comments, and fostering discussion on potential strategies for improvement.

On the other hand, based on the belief that third-party evaluations are more “neutral” than those presented directly from the original paper, benchmarking papers analyzing and comparing multiple methods on a wide range of evaluation settings are highly beneficial, by providing an unbiased view of related methods.

Bottom line

  • Many new computational methods are optimistically biased
  • Benchmarking analyses provide an unbias view among related methods

References

[1] Renming Liu, Christopher A Mancuso, Anna Yannakopoulos, Kayla A Johnson, Arjun Krishnan, Supervised learning is an accurate method for network-based gene classification, Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3457–3465, https://doi.org/10.1093/bioinformatics/btaa150

[2] Sheng Wang, Angela Oliveira Pisco, Aaron McGeever, Maria Brbic, Marinka Zitnik, Spyros Darmanis, Jure Leskovec, Jim Karkanias, Russ B. Altman, Unifying single-cell annotations based on the Cell Ontology, bioRxiv 810234; doi: https://doi.org/10.1101/810234

[3] Stanislav Mazurenko, Zbynek Prokop, and Jiri Damborsky, Machine Learning in Enzyme Engineering, ACS Catalysis 2020 10 (2), 1210–1223, https://doi.org/10.1021/acscatal.9b04321

[4] Buchka, S., Hapfelmeier, A., Gardner, P.P. et al. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biol 22, 152 (2021). https://doi.org/10.1186/s13059-021-02365-4

--

--

Remy Lau
The Citadel

Computational Mathematics | Bioinformatics | Network Science | Deep Learning | linkedin.com/in/remy-liu-a24780213/