AI and the Future of Computational Chemistry

Deep learning’s role in the hard sciences


We all know that companies like Google and Facebook have been acquiring artificial intelligence (AI) companies trying to stay competitive against each other. These acquisitions bring the expertise necessary to make voice recognition crisper and my maps program more accurate.

But can these technologies create the same level of innovation outside of the tech industry, say the consumer care space?

What I’m starting to realize is that we are at the cusp of a new evolution in the way we approach chemistry.

Let me explain.

Conducting lab experiments is a labor intensive task. It’s not for the faint of heart. Being in the lab during my PhD, I witnessed the stress directly associated with my colleagues’ failed experiments. With over 90% of experiments failing on small incremental changes of past work, it’s no wonder the empirical process wastes a lot of time and resources.

Back in 2013, computational chemistry had revolved around mechanistic based modeling systems that used a rigid methodology for determining how the body reacts to chemicals. These in silico tools simply didn’t add enough value to researchers in the experimental design process.

The standard practice for researchers was fitting parameters observed over several experimental conditions. Unfortunately there is no general formula to follow. If you fit one experimental condition, you’ll need a new set of algorithms to fit another conditional set.

At the time, AI was growing quickly in the technology and financial sectors, but hadn’t quite made its leap to the physical sciences.

I couldn’t allow the power of machine learning and other techniques such as deep learning go unused in a space riddled with inefficiency. Therefore after my PhD, I set out to explore ways in which these techniques build new materials for the body. I knew deep learning could more efficiently handle the complexity of how the body’s biological and chemical systems interact with each other than an engineer rigidly trying to model every aspect of the body by hand.

The first step toward this vision was to build a database specifically for the liver, brain, spleen, and lung receptors and learn how they interact with various materials previously tested. By first creating linear and empirical models, I was able to repurpose the features necessary for creating a machine learning system. Once this system was created, I applied support vector machines (SVMs) to test the features and models. This method delivered great accuracy, so I dug deeper.

To maximize the capabilities of our feature engineering and databases, I used deep learning to generalize and make these techniques work over a larger scale. This was exciting as it was the first time deep learning was applied to the hard sciences.

Once the models were accurate to my team’s liking, it was time to test them out in the real world. The problem with researchers in academia is that they’re super competitive with publishing that they hate sharing their data. The data they did want to share was their failed experiment datasets to hopefully glean insights on what went wrong. After countless conversations I was able to test our algorithms with 152 datasets all pertaining to how nanoparticles penetrate the various animal organ tissue in vivo.

A few of the institutions we worked with.
Our results across the 152 experiments.
In terms of sensitivity/recall, ROC, and precision where the laplacian score was utilized as the feature selection method for building the model.

Honestly, skin science fascinates me. I began researching the current limitations of the computation techniques in the space and found that it was hard to predict the bioavailability of actives in the skin after application. After meeting with several researchers, my team and I heard the same story: it took several complementary tests to determine the concentration of actives in the skin.

This was an opportunity to run an initial skin study with them to test our algorithms on a compound called hydroquinone. We saw a 97% correlation between our concentration prediction to their empirical results. Knowing that the dataset was fairly specific, we decided to further test our algorithms with a researcher working on sustained release. We helped her engineer a liposome ligand ratio to a more optimal level to reach her objectives. Overall, the researchers were impressed at the high level of accuracy these algorithms produced in a fraction of the time compared to current methods.

This validated technology went on to be called NuSilico, due to our new approach to in silico modeling.

I want to conclude that AI is extremely important for the advancement of understanding the body and the complexities associated with its biological and chemical systems. The capabilities of such a technology can be applied across multiple verticals within business. This includes understanding and adding context to the vast amounts of data being generated by users across the globe. The analysis of this information using AI will shed light on the uniqueness of each individual and allow us to build new products and therapies to fit their future needs.

If you’d like to learn more about this effort, please check out the presentation I gave at the SF Artificial Intelligence in Health MeetUp:

-Shalini Ananda, PhD.