Exploring the Potential of Machine Learning in Bioprospecting: Unveiling Nature’s Treasures

Sam Salmasi
5 min readNov 30, 2023

--

A Confluence of Ecology, Evolution, and Pharmaceutical Discovery

Bioprospecting, the systematic exploration of biological systems for valuable products, has been a cornerstone of pharmaceutical discovery. More than 60% of the small molecule drugs that have been introduced in the past 25 years were either natural, naturally-derived, or semisynthetic derivatives of natural products.

This underscores the continued importance and influence of natural sources in the development of new pharmaceuticals. It’s clear that nature remains a reservoir of inspiration and resources in the quest for novel therapeutic agents.

In recent years, the pharmaceutical industry has pivoted towards robotics and computational methods for lead discovery. I explored this topic, as well as a general overview of drug discovery methods and strategy in my last publishing, so be sure to check that out. Such high-throughput techniques aim to expedite the discovery process through rapid trial-and-error learning.

These methods are constrained by the size of available sample libraries or data banks. Astonishingly, it’s estimated that a mere 6% of higher-level plant species have been screened for biological activity, and less than 15% have been evaluated for phytochemical activity.

This is where ML-inspired bioprospecting strategies come into play. Such applications can be integrated with high-throughput methods to selectively enhance the sample libraries available for pre-clinical screening and hit-identification. This approach hinges on our understanding of ecology and evolution. By harnessing the power of nature and leveraging technological advancements, we can continue to uncover the vast potential of the natural world in the realm of drug discovery.

Plant Defense Mechanisms in Drug Discovery

In biological terms, a medically relevant property often refers to a plant’s defense mechanism that has potential therapeutic applications. Plants have evolved a variety of defense mechanisms to protect themselves from herbivores and pathogens. These mechanisms can be physical, such as thorns and tough leaves, or chemical, involving the production of bioactive compounds that deter predators or inhibit the growth of pathogens.

Photo by Jarryd Du Toit on Unsplash

These bioactive compounds, often referred to as secondary metabolites, are not directly involved in the growth, development, or reproduction of the plant, but they play a crucial role in the plant’s survival and adaptation to its environment. Secondary metabolites encompass a wide range of chemical compounds including alkaloids, terpenoids, phenolics, and others. Many of these compounds have been found to have medicinal properties, such as anti-inflammatory, antimicrobial, antiviral, or anticancer activities.

Darwin’s theory of…

Evolution posits that an organism producing a medically relevant property does so as a result of selective environmental pressures. Consequently, it’s plausible for two similar but geographically distinct environments to apply similar selective pressues, giving way to the emergence of analogous traits.

It is important to note that a trait is an abstract idea, and nature can give rise to these abstract ideas in many ways.

Fish and whales are prime examples of convergent evolution. Despite vastly different origins, both groups evolved traits such as streamlined bodies and fins to thrive in aquatic environments- despite the fact that their skeletal frameworks are vastly different!

I mention this to showcase how strikingly similar adaptations can independently arise in response to similar ecological pressures. These evolutionary trends, known as convergent evolution, forms the basis of our predictive models in bioprospecting.

Applying the Science

Bioprospecting strategies, rooted in Ecological Theory and evolutionary principles, work to enhance sample libraries crucial for discovering potential drug candidates. ML-inspired bioprospecting pairs AI’s capacity to work with large, complex data sets with scholarly strategies to produce guiding models.

These models predict instances of convergent evolution, directing exploration toward promising areas within the vast array of natural compounds.

This fusion of ecology, evolution, and pharmaceutical discovery signifies a new era in bioprospecting, poised to expedite the pursuit of the next generation of therapeutic agents.

Phylogenetic Implications

Phylogenetics examines the evolutionary relationships among organisms, sheding light on their shared ancestry and evolutionary history. It employs various methodologies to construct phylogenetic trees, showcasing the evolutionary paths of organisms.

An example of a phylogenetic tree.

When creating an ambitious model to analyze the evolution of some trait, the approach should center on each instance of the trait’s emergence rather than its presence in closely related species.

This distinction is pivotal: examining occurrences of the trait across different lineages provides a more accurate representation of its evolution and independent acquisition.

By focusing on each separate evolutionary occurrence of a trait, the model accommodates diverse evolutionary paths and convergences across various lineages. This approach aids in distinguishing whether the trait emerged multiple times in distinct lineages due to convergent evolution or originated from a common ancestor, persisting in related species due to shared ancestry.

Machine Learning’s Value

Machine learning techniques offer an efficient means to analyze extensive datasets and derive significant insights. Applying these techniques to the current challenge allows for predicting the probability of similar traits appearing in diverse species or environments.

Rather than concentrating solely on related species sharing a trait, these models adopt a broader perspective. They consider instances where a trait evolves independently across various lineages.

Consider the trait of toxin production in plants, used as a defense mechanism against herbivores. Machine learning models leverage environmental (like sunlight availability) and ecological data (such as local herbivore presence), along with evolutionary relationships among plant species. These models identify commonalities in environmental conditions where this trait has independently emerged.

This predictive analysis extends to foreseeing the emergence of similar traits, even in species not closely related to those already possessing this feature.

Through this fusion of machine learning, phylogenetics, and ecological theory, exploration occurs on how environmental factors influence the independent evolution of traits across diverse lineages. This approach offers predictive insights into the presence of traits in distinct biological contexts.

Photo by Wolfgang Hasselmann on Unsplash

Next time

In my upcoming post, I’m thrilled to present a real-life example showcasing how machine learning revolutionizes our approach to uncovering nature’s secrets.

I’ll take you through my journey of using this cutting-edge technology to predict regions of interest for new mental health therapeutics. It’s a fascinating exploration of how we leverage AI to navigate environmental data, unravel evolutionary connections, and forecast where specific characteristics might emerge next.

It’s like gaining a glimpse into Earth’s crystal ball, so be sure to stay tuned!

--

--