Bayes Labs
Published in

Bayes Labs

Molecular generation using interpretable substructure — Part 1

Case study 1: Generating new molecule for NNMT inhibitors.

In part ll we will focus more on generative modeling and concept behind that.

Motivation: Higher NNMT expression and MNA concentrations have been associated with obesity and type-2 diabetes. NNMT inhibitors inhibit NNMT activity, reduces MNA levels and drives insulin sensitization, glucose modulation and bodyweight reduction in animal models of metabolic disease.

Till date, there are no reports on the feasibility of using small molecule modulators of NNMT in preclinical animal models of metabolic disease to validate NNMT as a pharmacological drug target.

Introduction: Nicotinamide N-methyltransferase (NNMT) is a cytosolic enzyme that catalyzes the transfer of a methyl group from the cofactor S-adenosyl-L-methionine (SAM) onto the substrate, nicotinamide (NA) to form 1-methyl-nicotinamide (MNA). Higher NNMT expression and MNA concentrations have been associated with obesity and type-2 diabetes.

Role of NNMT in NAD+ metabolism and Methionine cycle.

Challenges and Solutions: Generating lead molecular candidate with multiple properties constraints are challenging for the whole pharmaceutical industries. We are applying AI technology in drug discovery to come up with the solution.

Workflow: We are using Graph-based generative model and reinforcement learning for new molecule generation with fragment-based drug discovery approach that followed by most of the medicinal chemists.

Workflow for NNMT inhibitor molecule generation

When we generate a new therapeutical candidate for disease treatment there are a lot of challenges comes from lead identification to lead optimization.

Lead identification:- Identifying a lead compound that can bind to the specific protein target is itself a very time consuming and labour intensive also resources wastages. For this task, we are using Deep generative modelling and reinforcement learning to generate new molecules having specific properties.

Lead optimization:- After generation, we are optimizing and filtering the compound based on many pyhsico-chemical, pharmacokinetics and toxicity constraints by applying graph-based predictive modelling.

Virtual Screening:- We are doing virtual screening for validation of lead compounds that can bind to target well and can show therapeutic effects by applying 3D convolution-based model trained on molecule and proteins from Chembl database and PDB database.

Parameter of measure protein-ligand binding:- (IC50 vs Ki):

IC50:-The concentration of a drug that is required for 50% inhibition in vitro.

Ki:-It is an indication of how potent an inhibitor is; the concentration required to produce half-maximum inhibition.

The relation of IC50 vs Ki in a competitive and uncompetitive environment.

Case study: NNMT inhibitor generation:-

Note:- Applied fragment-based molecular generation.

Data collection:- We collected ~ 100 molecules of known IC50 value with NNMT in range of 1–12000 nM, The target label is highly positively skewed and high variance data with some outliers. Performed outlier filtering and log transformation for getting nearly normal distribution data.

Molecules(SMILE) and their NNMT inhibitor log(IC50) value.

Data Preparation:- Performed log transformation on target label for good distribution, generated the substructures from 100 molecules using Monte Carlo tree search(MCTS) algorithm, property threshold(log(IC50)) <6.5 and got ~32 unique rationales (substructures) all the rationales were in the range of 10–25 atoms.

SMILES and their generated rationales(Substructure).

Molecule Generation:- First finetune the existing generative model(VAE) using substructure generated(~32) and property constraints NNMT inhibitor(log(IC50)), synthetic accessibility, and drug likeliness properties. After finetuning we generated ~1 million compounds. These are the sample valid compounds generated by the model.

Some of NNMT inhibitor generated molecule using Generative model(VAE).

Molecule Filtering:- For filtering drug-like molecule and toxicity we applied many ADMET properties filter like LogP, Solubility, Toxicity, Cyp450 inhibitors(1a2,3a4,2c9,2c19,2d6), Herg, Caco2 permeability, Blood-Brain Barrier, Plasma Protein Binding………!. Finally, we got only 2000 molecules that satisfied all the filters.

Virtual Screening:- For validating the molecule we applied 3D convolutional model and AutoDock Vina. AutoDock Vina is an open-source program for doing molecular docking. In 3D convolutional deep model first converted the molecule and protein sequence in 3d format then applied the 3D convolutional and fully connected network to predict the IC50 values.





We, at Bayes Labs are working on using AI and ML for medical research in the therapeutics sector. We are currently working on optimizing the AI and ML tools required in the drug discovery pipeline.

Recommended from Medium

Stock market prediction utilizing public sentiment analysis, namely presidential tweets.

A Visual Look at my Taste in Music

2. Data preprocessing using scikit learn| California Housing Prices dataset

Big Data Analytics

Reinforcement Learning, Intuition, and Abductive Reasoning

Predictive maintenance

Is Your Data Ready?

Embedding the Language of Football Using NLP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Indrajeet Kumar

Indrajeet Kumar

Data Scientist | working in Drug discovery generative model for molecular generation and optimizing the properties(ADMET) of molecules...

More from Medium

Judge Photo Aesthetics with Deep Learning

ANLP(4): Parsing — CYK Algorithm

Hybrid Cloud Setup for Deep Learning on Amazon EKS

State of the Art — Emotion Detection in Art Portraits