Recent Breakthrough in Human Olfactory Perception by using Machine Learning

Tanoy Debnath
Voice Tech Podcast
Published in
7 min readSep 17, 2019

The complexity of the human sense of smell is increasingly reflected in complex and high dimensional data, which opens opportunities for data driven approaches that complement hypothesis driven research.

This article is for those who are interested in human olfactory perception research to identify odor using machine learning with natural language semantic word representation.

· Odor Impression Prediction from Mass Spectra Using Machine Learning

NOZAKI et al. [1] had designed a novel predictive model based on an artificial neural network with a deep structure for predicting odor impression utilizing the mass spectra [Mass spectrum is one of the representative physicochemical properties of chemical substances] of chemicals and they conducted a series of computational analyses to evaluate its performance. They used two types of dataset like result of a sensory evaluation on monomolecular chemicals and the mass spectra of these chemicals. The former dataset as above refers to the results of the sensory test previously carried out by Dravnieks, in which 160 odorants were evaluated for each of 146 verbal descriptors on a scale of 0 to 5. They used 144 descriptors out of the original 146. Then for the latter dataset, the mass spectra of chemicals in which the electron ionization method with an energy of 70 [eV] was applied, were obtained from the Chemistry WebBook provided by National Institute of Standards and Technology, USA. Feature vectors extracted from the original high dimensional space using two auto-encoders equipped with both input and output layers in the model are used to build a mapping function from the feature space of mass spectra to feature space of sensory data. The results of prediction they obtained by the proposed new method have accuracy (R ~=0.76) shown in fig. 1.

Fig. 1: Experimental result for the 9-layer predictive model, which give a value close to the correlation coefficient or each method. 3024 (= 144 descriptors × 21 samples) data points are plotted in each (R ≅ 0.76). [1]

· Predicting Human Olfactory perception from chemical features of odor molecules

DREAM Olfaction Prediction Challenge have published a result in SCIENCE [2] demonstrating that, it is possible to predict odor from molecular structures. In this work, their team explains their capability to build a model to predict the odor qualities of a certain molecule, just by using its structure. Such a model will provide fundamental insights into how chemicals are transformed into a smell percept in the brain; how the brain perceives odors. Beyond this fundamental biological understanding, the ability to reverse- engineer smells by designing molecules would be a major technological advance for perfume and flavor industries. Fig. 2 depicts a schematic diagram for reverse engineering where spider plots represent perceptual data for all 21 attributes, with the lowest rating at the center and highest at the outside of the circle.

More than twenty global teams from academia and other corporate research organizations accepted the challenge and worked to understand how to predict smells. Algorithms were trained to predict sensory attributes of molecules based on their chemo-informatics features. A large psychophysical dataset [created by Andreas Kellar from Rockfeller Univeristy] collected from 49 individuals who profiled 476 structurally and perceptually diverse molecules [3] was supplied with 4884 physicochemical features of each of the molecules smelled by the subjects, including atom types, functional groups, and topological and geometrical properties. Multiple modeling approaches performed astonishingly well. One could say participants of the DREAM Olfaction Prediction Challenge had a good “nose” for predicting smells. They were able to not only predict how the odor of a molecule would be perceived across the entire group of individuals, but also what a specific person would think the molecule smells like. Surprisingly, they found that regularized linear regression models gave among the best predictions. They predicted 8 among 19 rated semantic descriptors (garlic, fish, sweet, fruit, burnt, spices, flower, sour).It’s a fascinating insight into the science of olfaction.

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

These results represent a giant leap towards solving a major challenge of understanding olfaction. Not only this is an important landmark in the neurobiology of olfaction, but also the predictive models they describe may guide the design of new molecules for fields such as perfumery, flavor science, or other industrial applications. At last authors concluded by saying that, there is still much room for improving by using more odor descriptors and using odor mixtures where there will be a possibility of stronger nonlinearity.

Fig. 2: Schematic [2] for reverse engineering, a desired sensory profile from molecular features.

· Predictive Modeling for odor character of a chemical using machine learning combined with Natural language processing

NOZAKI et al. [4] had published first report by combining machine olfaction with natural language processing for odor character prediction. Their predictive model utilizes nonlinear dimensionality reduction on mass spectra data and performs the clustering of descriptors by natural language processing. Sensory evaluation is widely used to measure human impressions to smell or taste by using verbal descriptors, such as “spicy” and “sweet”. However, as it requires significant amounts of time and human resources, a large-scale sensory evaluation test is difficult to perform. Their model successfully predicts a group of descriptors for a target chemical through a series of computer simulations.

Fig. 3: MDS diagrams based on (a) one minus the sample correlation between points, (b) cosine distance based on Word2vec modeling. Descriptors considered belonging to the same group, e.g. “rose”, “jasmine”, “lily”, “iris” and “violet” are marked in blue. “milk”, “cheese”, and “butter” are enclosed in red [4].

In this paper, they proposed a predictive model incorporating the language modeling method Word2vec to predict odor characters of chemicals represented by binary values from mass spectra. In the catalog data of Sigma-Aldrich used in this study, descriptors to represent the odor characters of molecules were used exclusively even if other descriptors represent similar odor characters, resulting in vanishing of similarity among descriptors. Although the number of clusters is a trade-off relationship with the accuracy of the model, their proposed model had a prediction accuracy of 53% for true positives and of 85% for true negatives when the number of odor clusters was set to twenty.

· Predicting Natural Language descriptions of mono-molecular odorants

Most people don’t have the right words to describe what they are smelling. Though humans can distinguish about a trillion odors, but our vocabulary is limited. Terms like fruity and musky are not only imprecise but also colored by cultural bias. Unlike other senses-hearing, sight, tough and taste- we have trouble agreeing on universal terms for smells. Followed by NOZAKI et al. [4], an IBM study recently published in Nature [5] suggests a promising solution to augment our smell vocabulary. Researchers led by computational neuroscientist Gulliermo Cecci used artificial intelligence to create an algorithm that translates fuzzy descriptive words to their molecular compound and vice versa. Fig. 4 shows the Construction of a universal perceptual map.

Fig. 4: Construction of a universal perceptual map [5].

Fig. 4(a) describes similitudes and differences between the DREAM and Dravnieks psychophysical olfactory datasets. Dravnieks dataset used olfaction experts, a smaller set of 128 molecules, 58 overlapping with DREAM, and a large set of 146 descriptors, 10 descriptors overlapping with DREAM. Diagram (b) showing the approach to predict ratings for the Dravnieks descriptor sets.

For non-overlap molecules between the datasets, a chemoinformatic model helps predicting ratings for values of the DREAM set of 19 perceptual descriptors. Then they use fastText to generate semantic vectors for the DREAM and Dravnieks descriptors by searching for co-occurrence of words in sentences as shown in the example (a fragment of Milton’s Paradise Lost). A model using these vectors is then applied to DREAM ratings/predictions to generate Dravnieks rating values for 131 descriptors.

Researchers summarized their findings in a fig 5. “Odor wheel” that takes the most commonly used English words to describe scent, and arranges them in associative order. For instance, “vanilla” is adjacent to “chocolate” and “caramel” on the wheel, signaling to perfumers and chemists that when someone wants a vanilla candle, they likely won’t mind hints of caramel or chocolate.

Fig. 5: Odor wheel [5]: prediction performance for each descriptor — as measured by the correlation across molecules between the ground truth and the predictions

References

[1] Nozaki Y, Nakamoto T (2016) Odor Impression Prediction from Mass Spectra. PLoS ONE 11(6)

[2] Keller A, Gerkin RC, Guan Y, Dhurandhar A, Turu G, Szalai B, Mainland JD, Ihara Y, Yu CW, Wolfinger R, et al.; DREAM Olfaction Prediction Consortium. 2017. Predicting human olfactory perception from chemical features of odor molecules. Science. 355:820–826.

[3] Keller A, Vosshall LB. 2016. Olfactory perception of chemically diverse molecules. BMC Neurosci. 17:55.

[4] Nozaki Y, Nakamoto T (2018) Correction: Predictive modeling for odor character of a chemical using machine learning combined with natural language processing. PLOS ONE 13(12).

[5] E. Darío Gutiérrez, Amit Dhurandhar, Andreas Keller, Pablo Meyer & Guillermo A. Cecchi,’predicting natural language descriptions of mono-molecular odorants’, Nature communications 9, Article number: 4979 (2018).

--

--

Tanoy Debnath
Voice Tech Podcast

Machine Learning Researcher @ Yokogawa Electric Corporation (R&D)