Real World Deep Neural Network Architectures for Pharma Industry

deepakvraghavan
11 min readMay 15, 2018

Pharmaceutical industry is at an inflection point and is ripe for disruption to stay relevant in today’s competitive market. The world economy is projected to double in the next ten years with 3 billion consumers in emerging markets. The population of the elder demographic of people over 65 years is increasing at a steady pace. This is projected to be 16% in 2050 compared to 8% in the year 2010. This drives the demand to achieve faster and better drug development techniques to keep up with the demand. Typical drug development cycle takes 15 years and approximately 2.6 billion per drug, even if we can make marginal improvements in timeline and cost, the savings are potentially huge. In this article we will look at how Machine Learning (specifically Deep Neural Networks) can be used to meet the growing demand and disrupt the pharma industry.

In this article, we will look at three areas where Machine Learning can be used in the Pharma industry to address the current challenges.

Drug Discovery

There are many life-threatening diseases which are yet to find a cure like Cancer, HIV, Alzheimer’s etc. A recent study found out that Alzheimer’s and other forms of dementia on average costs more than $287,000 over five years as compared to $175,000 for hear disease and $173,000 for cancer. There is an incredible amount of time and money spent to find better sustainable cure for these diseases in this medical pipeline from research until production. Machine Learning, especially Deep Learning is being leveraged to find better cure and realize faster drug discovery which will eventually bring down the cost of treatment.

The different possibilities of molecules and the possible combination of their use in new drugs is a complex mathematical problem given the size and the variety of this data. To get a brief idea of the scale of drug discovery problem, take a look at the interactive drug demo at https://dash-drug-explorer.plot.ly/

This demo shows the list of a big list of drugs and the chemical structure at the molecular level using a 3D graph plotted against the solubility, acid strength (pkA), and Log P. Log P is a component of Lipinski’s Rule of 5 (also known as Pfizer’s rule of five or RO5). It is a rule of thumb to evaluate drug likeness or determine if a chemical compound with a certain pharmacological or biological activity has properties that would make it a likely orally active drug in humans.

In addition to the chemical structure, each of the drugs has a detailed sheet which lists all the individual molecules in it. There is a detailed taxonomy on how these different molecules are classified based on their composition and how they are used in chemical drugs. For example, for the drug Levobupivacaine, the drug detail sheet is at https://www.drugbank.ca/drugs/DB01002

Because of the innumerable possibilities of molecules in the drug discovery process, the timeline is long as shown in the self-explanatory graphic.

The image below shows the different phases of this clinical pipeline. To make things challenging, 90% of all clinical trials in humans fail even after the molecules have been successfully tested in animals.

This timeline can be typically broken down into four key phases:

  • Study Medical Literature: The first step is to find associations between drugs, diseases and proteins published in prior literature to find the current gaps.
  • Choose properties of the desired drug: The next step is to find drug with similar structural backbone (found out using approaches of Structure Activity Relationships (SAR) and chemical scaffolds) but have improved properties that can be used to treat a disease under consideration.
  • Identify molecules with the desired properties: Once we have the identified properties of the drug to treat a certain disease, the next step is to identify a molecule which closely matches this characteristic. This is computationally complex step. To give you an example, one standard database consists of 72 Million different molecules. If you consider the overall possibility of all molecules, this is somewhere in the range of 10⁶³ to 10²⁰⁰ and synthesizing a new molecule may cost tens of thousands of dollars.
  • Experimentation: Once we identify the set of molecules determined, the next is to go through a clinical trail to on animals and humans to get the drug FDA approved.

Given this amount of data that is extensive in variety and volume and the time and cost of going through different permutations of picking the molecules (an extremely computationally intensive task), AI and machine learning algorithms can be used to choose the molecules that can be used with the desired properties for a drug. In addition to the first two steps, for the experimentation step, there is a potential use of leveraging AI and Deep Learning in a simulation setting during the clinical trail process (Note that currently there is no use of simulation or computation for the clinical trail step). We will see how Deep Learning has been proven effective because of their rapid learning rate using the available public drug data sets. Here are three examples with the Neural Network architecture of models created for drug discovery.

Bioactivity Modeling using Recurrent Neural Networks (RNN)

There are different notations to show molecular structures, one of them is called SMILES. It stands for Simplified Molecular Input Line Entry Specification (SMILES). This is a line notation for molecules and SMILES strings are atoms are represented by element symbols except Hydrogen. For example, Benzene (C6H6) is represented as c1ccccc1. As you can tell the repetitive nature of molecular representation (especially for hydrocarbons) is similar to sequence data seen in Machine Learning. This is a good problem space to use RNN or Long Short-Term Memory (LSTM) networks.

In this paper, the chemical language model was trained on a SMILES file containing 1.4 Million molecules from ChEMBL database. This database has a list of drugs that shows whether a drug made a disease active or inactive(which makes it a Supervised Classification Machine Learning use case). With this data of the mapping between the drugs and the disease, it is possible to predict whether a new drug can be used to treat a disease. The authors employed an RNN with three stacked LSTM layers, each with 1024 dimensions and each one followed by a dropout layer with a dropout ratio of 0.2 to facilitate better learning for the model. To generate novel molecules, 50,000,000 SMILES symbols were sampled from the model symbol by symbol. The architecture is shown below.

Result: By using the LSTM architecture and filtering out duplicates, the authors obtained 847,955 novel molecules!

Predict bioactivity of small molecules using Convolutional Neural Networks (CNN)

AtomNet was the first structure based deep CNN designed to predict the bio activity of small molecules for drug discovery. CNNs are good for classifying images. When we look at an image of a drug through a microscope, we can look at its bio activity aka the patterns of how the molecules are moving in a certain direction and how they are interacting with each other. In addition to that, we can also see how these are reacting to a certain disease. AtomNet was designed to study and analyze this interaction and understand the patterns. Below is the architecture for AtomNet.

It consists of an input layer followed by multiple 3D-convolutional and fully-connected layers, topped by a logistic-cost layer that assigns probabilities over the active and inactive classes.

Result: Once we study the mapping of all different drugs to the diseases in the corpus data, when we have a new drug we can predict how this would interact with a certain disease (as we do in a standard Image Classification scenarios)

ChemGan using Generative Adversarial Networks (GAN)

There has been recent research where the authors of ChemGAN have applied Generative Adversarial Networks (GANs) to solve the problem of Drug Discovery. GAN architectures use 2 neural networks that compete with each other. In this case, for example where we have an existing drug database, the Generator generates a novel aka new drug. The discriminator takes the drug generated from the generated and determines if it’s a real drug or a fake drug. This process continues till the generator generates data points that are so good that the discriminator is no longer able to distinguish “fakes” from real ones. Both are being updated using gradient descent, which is an optimization strategy over time. Both are learning with time, and eventually the model reaches the Nash Equilibrium.

The authors of ChemGAN proposed an architecture for generating lead molecules based on a variation of GAN called Adversarial Autoencoders (AAE). The idea here is to learn to generate objects from their latent representations. Autoencoders are neural architectures that take an object as input and try to return the same object as output. In the middle of the architecture, the input goes through a middle layer that learns a latent representation (a minimal set of features that encode the input in a way that subsequent layers can decode the object back). Below is the architecture used in ChemGAN.

In this architecture, the autoencoder has to extract the really important features from the input.

Result: The model was trained to generate fingerprints of molecules using the desired drug properties as conditions. These novel drug outputs generated can then be tested for their effectiveness during the clinical trial process.

Pharmacogenomics

Pharmacogenomics is the study of how genes affect a person’s response to drugs. This field combines pharmacology (the study of drugs) and genomics (study of genes and their functions) to develop effective and safe medications and dosage that is tailored to an individual depending on their genetic makeup. The current approach of “one size fits all” drug administration does not work the same way for everyone. Some of the patients show good progress whereas some show adverse side effects in addition to slow response to the drug. Pharmacogenomics offers promise for applications such as medication optimization for patients based on genotype in diagnostic testing, value as a companion diagnostic (CDx), and drug discovery and development. Pharmaceutical companies that are faced with rising costs and resource investments required for drug development, have begun to recognize the potential of genomics for drug discovery, and to a lesser extent, for stratification of participants in clinical trials to mitigate adverse events and increase efficacy. The continuing growth of different types of collected data that can improve phenotype-driven therapy via Pharmacogenomics also poses a number of challenges for accurate treatment response and outcome prediction as shown in the picture below.

Deep Learning for Pharmacogenomics

Extracting usable knowledge from large databases requires advanced computational methods that can find patterns, conduct prediction, detection, and classification along with visual data analytics. Current approaches for knowledge extraction in Pharmacogenomics include statistical methods, machine learning, and, recently, deep learning. Therefore, new deep learning-based predictive analytic methods are desirable to accelerate the discovery of new Pharmacogenomic markers, forecast drug efficacy in stratified cohorts of patients to minimize potential adverse effects of drugs, and to maximize the success of treatment.

The above figure shows an idealized collective example of deep learning applications in pharmacogenomics. First, deep neural networks are trained on various existing datasets and/or their combinations. Depending on the type of data and a task in hand, prediction outcomes for a dataset can be known (supervised learning), partially known (semi-supervised learning), or not-known-at-all (unsupervised learning). Due to the flexibility of architectures, neural networks are capable of multimodal learning, i.e. jointly learning from several different datasets and data types without explicit definition of common features.

Infectious Disease Control (Epidemiology)

The practice of monitoring infectious-disease processes has traditionally relied heavily on surveillance and expert opinion. Once the surveillance data are collected, public-health officials consult with subject-matter experts, and appropriate measures to control an infectious-disease outbreak are designed and implemented. However, these actions are not always efficiently coordinated and do not occur quickly enough to enable rapid decision making that can minimize morbidity and mortality. Modeling is a tool that fills the void in preemptive infectious-disease decision making by using available data to provide quantitative estimates of outbreak trajectories. While modeling is an improvement over standard documentation, artificial intelligence (AI) technology is evolving at a very rapid rate. The ability of Deep Learning to augment decision-making processes is attributed to the speed of pattern recognition and the robust amount of data that are digested and analyzed for optimal health outcomes.

Seasonal influenza is a major global health issue that affects many people across the world. The model here shows how data-driven machine learning methods are capable of making real-time influenza forecasts that integrate impacts of climate factors and geographical proximity.

This model provides a suitable architecture for the time series prediction problems due to its sequential framework. The above diagram shows the network architecture consisting of the unrolled LSTM cells that are trained by the back propagation algorithm based on the mean-square-error cost function (training criterion). The corresponding LSTM cell at time t−i receives the flu count calculated by the predecessor cell (ot−i−1) and the input, xt−i , to calculate the flu count at t − i, ot−i . This process is repeated for all the LSTM cells in the model. The number of LSTM cells denotes the number of time steps, T, before the current time. To calculate the flu count at the current state, t, the data points from T previous time steps are used. Each of the climatic variables such as humidity, sun exposure, precipitation, and temperature have different degrees of impact on influenza spread in a geographical region. These variables are also taken into account for the Deep Neural Network model.

The proposed method offers a promising direction to improve the performance of real-time influenza forecasting models. Additionally, the proposed method may be useful for other serious viral illnesses such as Ebola and Zika. In this paper, we have implemented separate learning components for the climatic variables and for the geospatially proximal variables

Conclusion

This article shows the latest cutting-edge practical applications of using Deep Neural Networks in three key areas (Drug Discovery, Pharmacogenomics, Epidemiology) as applicable in Pharmaceutical industry. There is active research being conducted in academia to understand these applications. The future for Pharma industry looks promising when using the principles of Deep Neural Networks and Machine Learning to solve the complex problems of this domain.

References

--

--