What’s that Smell? GNNose Knows!

Published in

Stanford CS224W GraphML Tutorials

9 min readMay 14, 2023

By Sarah Chen, Matthew Ding, and Cathy Zhou as part of the Stanford CS224W Winter 2023 course project.

https://lh3.googleusercontent.com/0qdn6DK0KnhkrtqaBvXWrazquP0Z3fBBGYogq_y6x2CC5eEBARP9mL2_Go3PqgPrfNTfpEDIMmWlliRVgsf1Z472EC2I2MdNZJys1N7S

Sniff. Sniff. What’s that smell? It’s a tangy and pungent aroma that’s also… slightly sour? Is it… could it be… yes! It’s CH3COOH, or as it’s more commonly known, vinegar.

For over 70 years, scientists at the intersection of chemistry, neuroscience, and (recently) machine learning have investigated how molecular structures of substances affect how they smell to humans. This is a surprisingly difficult task. A small difference can result in completely different flavors, and molecules with drastically different structures can have similar odors [1, 4]. Recent advances in deep learning have enabled progress on this problem via ResNets, GNNs, and other architectures.

Structurally similar molecules do not necessarily have similar odor descriptors from Sanchez-Lengeling et al., reprinted from Ohloff, Pickenhagen and Kraft.

From a practical standpoint, improvements in molecular odor prediction may contribute to the discovery of new synthetic odorants, reducing the environmental impact of harvesting natural odorants. Additionally, insights from learned models may contribute to our understanding of chemistry and neuroscience.

For our final project, we trained a graph neural network (GNN) to predict the odors of various chemical molecules. Graphs are a natural way to model molecular data, viewing atoms as nodes and bonds between atoms as edges. GNNs trained on graph data are able to capture features of the chemicals and the structural connection. Although traditional ML approaches such as regularized linear models and random forests provide good predictions [2, 5], GNNs appear to outperform traditional methods [4] and provide further methods for interpreting the learned model.

Dataset

We examined two olfactory databases:

The Leffingwell PMP 2001 Database of Perfumery Materials & Performance which includes 4,100 molecules [4] and
The DREAM Olfaction Challenge dataset which includes 4,870 molecules [5]

Each entry contains a string in SMILES notation describing a molecule’s structure and a set of labels describing the molecule’s odor.

Example entries from the Leffingwell dataset.

We formalized the task as a binary classification of molecules into “pungent” or “not pungent” classes from the graph representation of the molecular structure, inspired by similar work by Distill [3]. This task is made more challenging due to class imbalance. Both datasets had only around 200 molecules with “pungent” labels.

Model

We first discuss at a high level how graph neural networks work before diving into the details of our architecture. Like all neural networks, GNNs attempt to learn embeddings of graph data in some high-dimensional embedding space. The key insight of GNNs is to use the local graph structure in learning these embeddings. Nodes in our graph are initialized with some initial features. Each layer of the GNN then performs a 2-step message passing + aggregation update. During the message passing step, nodes apply a function to their embeddings (message) and send the message to all neighbors that share an edge. During the aggregation step, nodes collect all messages received from their neighbors and use the messages to update their own embeddings.

Diagram showing message passing+aggregation steps for a single GNN layer. From CS224W Slides, Lecture 4.

For our particular project, we used the Graph Isomorphism Network (GIN) architecture proposed by Xu et al. [6]. GIN is the most expressive GNN with a fully injective neighborhood aggregation function to distinguish nodes in a graph. The GIN embedding update rule is:

Embedding update rule from Xu et al. with added annotations.

The aggregation rule uses a summation and a multi-layer perceptron (MLP). To perform binary graph-level classification, we use GIN layers to propagate node features, max-pool node embeddings across the graph, and a final MLP to act as a classification head.

Contrastive self-supervised pretraining

As part of our project, we also evaluate the benefit of pretraining via contrastive self-supervised learning. Contrastive learning first applies random data augmentations to graphs in the training batch and then trains using contrastive loss. Given that it’s self-supervised, it can be used as a pretraining method to obtain richer embeddings for graph data.

Contrastive learning on graphs was originally proposed by You et al. [7], which outlined an approach that maximizes the agreement between graph representations with perturbed edges and dropped nodes.

Illustration of the Graph Contrastive Learning Algorithm

We consider a version of this approach that maximizes the agreement between pairs of unmodified and augmented graphs. In particular, we consider two types of augmentations: edge perturbation and node dropping. For edge perturbation, an original graph (left) is augmented by randomly adding or dropping edges (middle). For node dropping, we randomly drop a fraction of nodes, along with their relevant edges (right).

Augmentations with edge perturbation and node dropping, compared to the original graph

To train, we use the contrastive loss to incentivize the model to associate a graph with its augmented version (positive pair) more closely than with other graphs in the batch (negative pairs). We use pooling across nodes to encode each graph as a single vector with the InfoNCE objective as our loss. Max-pooling yields the best performance during hyperparameter tuning.

We apply these methods using the GraphSSL repository [8].

Specifically, we train a GIN with max-pooling. We chose to train on the GIN because its expressiveness allowed the most positive transfer in contrastive pertaining, according to Hu et al. [10]. The pretraining network is defined in the following code snippet.

We use the obtained weights as a starting point for finetuning a GIN on our downstream pungency classification task.

Results

We used the following configuration when training the model and tuning hyperparameters:

Train-val-test split: 0.7, 0.1, 0.2
Optimizer: Adam
Loss: Weighted Binary Cross-Entropy, due to class imbalance
Evaluation Metrics: F1 score, AUC-ROC

We searched for the best set of hyperparameters for the pretraining trial, and the training train on both the Leffingwell and DREAM Olfaction Challenge datasets. We ran the pre-training for 30 epochs and the training for 70. Below are the hyperparameter combinations that we obtained:

Below are some loss curves obtained during training:

Train and Validation Loss during pretraining, with epochs = 30, batch_size = 64, lr = 5e-4, dropout = 0.8, weight_decay = 1e-5 on the Leffingwell Dataset

Train and Validation loss and F1 and ROC scores for fine-tuning on the Leffingwell Datset, with epochs = 70, batch_size = 64, lr = 1e-2, dropout = 0.5, weight_decay = 1e-3 on the Leffingwell Dataset

The chart below shows the results obtained by using the best set of hyperparameters. Overall, the model achieves better performance on the Leffingwell dataset. Pretraining did not affect the AUC-ROC score of the predictions but boosted the F1 scores of the models significantly.

Explainability

We used the GNNExplainer [9] to:

Extract the importance of each attribute feature, including node and edge attributes contributing to the prediction and
Derive structural importance to each part of the molecule

This helps us to understand how the model makes predictions, and it may serve as a tool for chemists and scientists to focus on particular chemical properties and generate hypotheses as to how molecular structure associates with their scents.

The GNNExplainer extracts the most important components of a graph that directly impacts the prediction by pruning redundant information. It aims to find a subgraph that minimizes the prediction differences between the subgraph and the computation graph. It applies “edge masks” and node “feature masks” to the graph, which assigns an importance value to each attribute of the node features and the edges.

Illustration for Edge masks and node feature masks

The GNNExplainer optimizes the masks by maximizing the mutual information between the subgraph, G_s, and the computational graph G:

The graph-level odor prediction task requires the GNNExplainer to take into account the entire graph rather than single nodes. The graph-level GNNExplainer takes the union of the computation graphs over all nodes in the graph, which are the k-hop neighborhood of the nodes.

Attribute Features

Using these approaches, we obtained the node feature importance and edge feature importance scores for all the chemical features that we extracted. We averaged the feature importance obtained across all molecular graphs to visualize the model’s emphasis.

The atomic number, degree, formal charge, hybridization of nodes, and the bond type of edges appeared to be key contributors to the prediction.

Feature importance scores averaged across all graphs in the test set for node attributes and edge attributes.

2. Structural Importance

We also visualized the structural importance in the graphs below. In each instance, we highlighted the nodes and edges with higher importance scores in their feature masks.

The graphs below are examples of structural importance visualizations we generated. The most important nodes are associated with important edges, and the model seems to put more weight on those areas. However, the weight is distributed rather evenly across the entire molecule when we observed the weight distributions of the edge and node masks, meaning that the model takes into account all aspects of the molecular structure.

Examples of molecular structure importance visualization. Highlighted nodes and edges have greater than average importance scores.

Discussion

We find that contrastive pretraining often yields positive transfer in our application domain, especially to the F1 score of the model. The main difference between the F1 score and the AUC-ROC is that the F1 takes predicted classes and the AUC-ROC takes predicted scores as input. Therefore, with a more heavily-imbalanced dataset like ours, F1 is a more indicative metric, and contrastive pretraining helped with predictions on an imbalanced dataset.

For future explorations, alternative approaches could include pretraining using tasks on both the node and graph levels [10]. In particular, these methods would allow us to leverage supervised data from labels beyond pungency in odor prediction datasets.

More generally, while GNNs seem intuitively appropriate for this setting, recent work on the odor prediction problem has opted for methods beyond pure GNNs, citing the limitations that molecules may have extremely similar structures but very different odors [11].

Below is a Collab notebook and GitHub link with our code:

Google Colaboratory

Colab for the project

colab.research.google.com

GitHub - mattyding/cs224w-project

Repository for the project

github.com

References

[1] Ralf Günter Berger. Scent and chemistry. the molecular world of odors. by günther ohloff, wilhelm pickenhagen and philip kraft. Angewandte Chemie International Edition, 51(13):3058– 3058, 2012.

[2] Lei Zhang, Haitao Mao, Yu Zhuang, Lu Wang, Linlin Liu, Yachao Dong, Jian Du, Wancui Xie, and Zhihong Yuan. Odor prediction and aroma mixture design using machine learning model and molecular surface charge density profiles. Chemical Engineering Science, 245:116947, 2021.

[3] Benjamin Sanchez-Lengeling, Emily Reif, Adam Pearce, Alexander B. Wiltschko. A Gentle Introduction to Graph Neural Networks, 2021. https://distill.pub/2021/gnn-intro/.

[4] Benjamin Sanchez-Lengeling, Jennifer N. Wei, Brian K. Lee, Richard C. Gerkin, Alán Aspuru- Guzik, and Alexander B. Wiltschko. Machine learning for scent: Learning generalizable perceptual representations of small molecules, 2019.

[5] Keller A, Gerkin RC, Guan Y, et al. Predicting Human Olfactory Perception from Chemical Features of Odor Molecules. Science (New York, NY). 2017;355(6327):820–826. doi:10.1126/science.aal2014

[6] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?, 2019.

[7] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations, 2021.

[8] @paridhimaheshwari2708. GraphSSL. https://github.com/paridhimaheshwari2708/GraphSSL, 2021.

[9] Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. Gnnexplainer: Generating explanations for graph neural networks, 2019.

[10] Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. Strategies for Pre-training Graph Neural Networks, 2020.

[11] Yu Wang, Qilong Zhao, Mingyuan Ma, and Jin Xu. Decoding Structure–Odor Relationship Based on Hypergraph Neural Network and Deep Attentional Factorization Machine, 2022.