Boosting Our Understanding of Microbes — with Software Repurposing

Inside IBM Research
The Startup
Published in
4 min readJan 25, 2021


By Ritesh Krishna and Katia Moskvitch

Developing new software for a specific scientific task can be time-consuming and costly. Software repurposing can help — at times it can even improve the results of the task compared to the traditional methods. This is exactly what our global team from IBM Research Daresbury in the UK, and Almaden and Yorktown in the US has achieved.

In our latest paper, “Repurposing software for functional characterization of the microbiome,” published in the Microbiome Journal, we propose a way to improve the speed, sensitivity and accuracy of what’s known as microbial functional profiling — determining what microbes in a specific environment are capable of. Our method is based on clever reuse of bioinformatic tools that were originally developed for a different task.

Microbial functional profiling can help improve our limited understanding of the world of the teeny tiny organisms that live all around and also inside us — our microbiome. When microbes throw a party, we can get stomach aches, bloating and other issues, but doctors may find it hard to treat them effectively.

Microbes are all around us and understanding them better is important to help us keep our health in check by better understanding various diseases and the environment.

Functional profiling can help. It’s part of metagenomics — data-intensive science that involves sampling an environment with genomic technologies. Metagenomics takes raw data on a computational journey, to give scientists information so that they can make biologically-relevant insights about the nature of microbes in a specific environment and assess what they are capable of.

But because it’s so computationally intensive, it can take hours and sometimes days to perform a metagenomic analysis. Each metagenomics experiment can generate several gigabytes of data that have to be processed in computational workflows.

These workflows consist of multiple steps and tools. Typically, the first steps after quality control and filtering include taxonomic classification — identifying which microbes are present — and functional profiling. Functional profiling is often more relevant for practical applications, but the computational effort to run it can be massively higher than that of taxonomic classification.

This is where our research can be of use. We have developed computational techniques that could help improve our limited knowledge of microbiome by making it much easier and less computationally intensive to run microbial functional profiling. And we did it using previously existing software, well-known within the scientific community.

How repurposing started

The inspiration to perform software repurposing came from our previous work on a classifier we dubbed PRROMenade, as well as on IBM’s Functional Genomics Platform. PRROMenade uses a tree-shaped data structure to propose direct, one-step functional annotation for metagenomics reads. It is powered by k-mer (short DNA subsequence of length k) based algorithms that enable several well-known taxonomic profiling tools, and relies on variable length sequence matching that is more flexible than fixed-size k-mer methods.

We knew from our experience that the k-mer-based algorithms were much faster than traditional functional profiling methods. That’s because they relied on computationally simpler string-matching operations, often performed in-memory due to the smaller size of pre-requisite look-up database. So we decided to test if it was possible to repurpose the commonly used taxonomic profiling tools to perform both taxonomic andfunctional profiling.

First, we compared the microbiomes of several people with plant- and animal-based diets, where diet has a visible impact on the gut microbiome and its functions. This takes the saying “you are what you eat” to a whole new level: it’s not just the person who is affected by his or her every meal but their gut bacteria as well. We also compared soil bacterial communities across the globe, linking antioxidant and nutrient reservoir activity with geographical influences. Insights into keeping a healthy soil microbiome can be critical for food security and tackling climate change — soil is a vast carbon sink, effective in removing CO2 from the atmosphere and storing it as carbon via the microbiome.

Our tests showed an improvement in functional profiling in speed and accuracy. We found that repurposed software helps cut down the processing time and remove the need for an extra tool. Another advantage is that these tools can run on large machines as well as standard laptops.

We believe that our results could help speed up an important computational step in metagenomics data processing. They also show that software repurposing is not only possible in metagenomics, but it has potential to diversify the usage of existing tools, effectively cutting down time in software development and adaptation. Next, we aim to investigate a diverse range of samples to gain biological insight into microbes’ behavior in different environments.

We hope that our research results could help push the limits of scientists’ understanding of the secret lives of microbes so that we are able to deal with them much more effectively than ever before. The wheel really doesn’t have to be reinvented every time there is a new problem. Software repurposing can help cut down development time, reduce the learning curve and improve the quality of results compared to traditional methods through clever algorithmic improvisations — and we should do it more often.

This story was first published on IBM Research blog