The Art of Biological Data Science

How algorithms about manipulating data are ushering in the era of generative biology. How data science works in biology

Dzmitry Hramyka
Axioma AI Journal
5 min readJul 2, 2022

--

Photo by Luke Chesser on Unsplash

Using Google search, you probably noticed that Google somehow predicts what you wanted to type. Google’s algorithms can offer whole sentences and even write whole stories. All this becomes possible thanks to powerful algorithms and the competent application of these algorithms to data. However, how reasonable is such a compilation of algorithms for biological data?

This story is about what Data Science looks like in the biological field today, what trends are highlighted in this and why it is so important for the future of humanity.

Why do we need Data Science in Biology?

Data science — is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data and apply knowledge from data across a broad range of application domains.
Definition from Wikipedia

What if we could recognize epidemics in advance and prevent them before the invasion of troubles? What if we could test new drugs not on laboratory mice, but on computers? What if we could similarly predict which molecules might create new plastics for folding phone screens before we ever test them in the lab? And these are the questions that data science is looking for answers to.

Data Science by .

Since data science works with big data and creates different pipelines for different purposes, the biological sphere is a good target for this approach.

What specificities does Data Science face in Biological science?

1. Biological data is not Big Data — It is Gigantic Data!

In terms of data, what distinguishes biology from other fields is its enormous size. As an example, it is enough to recall the average bacterial genome (about 4 million pairs of nucleotides), add to it an annotation of the functions behind a simple sequence, recall the proteins and biochemical pathways following this and you can understand that the data takes up much more than gigabyte!

Integrating All of Biology into a Public Neo4j Database

All these data can be used for the purpose of inventing the desired product. Literally, by looking at the substances that the bacterium produces, you can understand what modifications need to be made to get the desired substance from the bacterium.

2. Black box problem

There are parts of biology that are well studied, and based on them it is easy to train an ML model in a naive way, only to rediscover what any biologist could tell you.

Photo by Milad Fakurian on Unsplash

As for everything else, the team needs to create tools to explore the vast unknowns in biology. Biology is a very complicated thing. There is still no good understanding of how even a single cell works. Even in yeast or E. coli, the most studied laboratory organisms in the world — because they are so extraordinarily complex. Thus, trying to understand organic matter, not to mention designing it, is a very wide space. It’s still a big black box.

What is possible to do and what are we doing?

Biology has evolved from an observational science to a descriptive one. And only in recent decades has biology presented itself as an inventive science. We are not looking at the old, but creating the future! And the use of data analysis in biology is the best example of this.

The existing directions of Bioinformatics concern Structural Biology and specifically modeling of structures (proteins) with specific functions. This means, for example, the synthesis of substances against diseases, or finding specific biomaterials with extraordinary properties.
I have written several articles on this topic, which you can read, clicking the links below.

In addition to the biology of materials, data analysis is found in genomics, where the main goal is to study the primary causes of diseases. With the help of DNA analysis, it becomes possible to prevent extremely serious diseases, even such as cancer.

Photo by National Cancer Institute on Unsplash

Data Science also appears in the fields of population biology (the study of whole aggregates of living organisms), ecology, neuropsychology and many other areas. You can write more than one article about each of them, so if you are interested in finding out, be sure to write a comment so that I can understand your preferences.

Finally, I want to say that every application of computer science should work for people, and applications in biology are the best examples of how people can move forward and open new horizons!

With the help of knowledge, data and powerful methods, humanity can not only prevent many terrible things, but also create new opportunities for a better future. Cure diseases, help species diversity, expand human functionality and much more.

Be human, do science 🕊

🔔 Loved this Article & Want more?
📩 Feel free to follow and subscribe to my newsletter.

🔍 New in Medium?
📌 Join the largest community!

🔍 Interested in Science and Bioinformatics particularly?
📌 View my other Articles.

❓ Have questions?
✅ Feel free to contact me on:
🔘 Linkedin
🔘 Twitter

--

--

Dzmitry Hramyka
Axioma AI Journal

I am research student in Bioinformatics/Molecular Biology. Highly interested in AI/ML/Technology. Love make tools for humans and share my opinion here.