In a good smell-Part 1

Almogklein
4 min readJan 7, 2023

--

Taken from “phys.org” — Link

Welcome to the first part of my research journey into the world of scent analysis and classification. I recently had the opportunity to work with unique odor data collected from digital sensors, which has opened up a new realm of possibilities for me to explore.

This post will focus on cleaning, pre-processing, data analysis, and labeling the different smells in the data. In the second part of the post, we will continue our research by visualizing an odor sample through dimension extraction algorithms and associating new samples with subgroups.

Before diving into the data, it’s essential to understand the distinction between “smell” and “odor.”

“Smell” refers to the general sense of detecting and identifying odors, while “Odor” specifically refers to the characteristic or quality of a smell. Different smells can be pleasant or unpleasant to humans, depending on individual preferences and past experiences. For example, the smell of freshly baked cookies may be pleasant to some, while the smell of cigarette smoke may be unpleasant to others.

Google images with the quarry, “smell and odor”

alright, alright, alright — DATA!

Now, let’s take a look at the data. When I received it, the data had no column names, so I had to manually clean it up.

The resulting table contained records of values from various odor sensors, with ten columns representing each sensor (labeled with an “S” and a number). Each row represented a sample from each sensor for the same odor sample.

The data was collected by exposing digital sensors to an odor sample for about a minute, with a resolution of one sample per second. After exposure to the material, the sensors were exposed to clean air, and this process was repeated a few times (called a “stroke”).

Each material was sampled with about 3–5 strokes, and a sample of smell was called a “Contegees Number,” which had a “Base” and “Batch Index.”

Structure of numerical features

Now that we have a basic understanding of the data let’s delve into the numerical features and see how many unique values each has.

This will give us an idea of the number of binary, ordinal, and continuous features in the dataset. By examining this information, we can gain valuable insights that will help us better understand the data.

In addition, the black background tells us that the data is complete and does not contain null values.

Numerical features

One tool we can use to visualize the numerical features is the pandas’ .plot() function, which creates a plot for each sample in the dataset, with the y-axis displaying the feature value and the x-axis representing the sample index. These plots can be very helpful for data cleaning and exploratory data analysis.

Feature distribution

Another helpful technique for gaining insights into the data is examining the distribution of values for each feature. One quick way to do this for numerical features is using histogram plots.

Most frequent entry:

The above image shows the most frequent entry in the data, which can give us an idea of the dominant values in the dataset.

Correlation:

The above image shows the correlation between different features in the dataset. This can help us identify relationships between features and inform feature selection or transformation.

PairPlot:

Finally, we have the pairplot created using seaborn’s pairplot function. This allows us to visualize the relationships between the continuous features we selected. However, it’s important to note that it may take a long time to create all the subplots if we use it for more than approximately ten features at once.

Summary of the post:

  • Analyzing odors using data collected from digital sensors.
  • Initial steps included cleaning and pre-processing the data and analyzing and labeling the different smells.
  • Numerical features were examined to gain insights into the dataset and inform feature selection and transformation.
  • Tools used for data visualization included pandas’ .plot() function, histogram plots, and seaborn’s pairplot function.

In the next part of this research journey, we will continue exploring this fascinating world of odors and the potential of digital sensors to detect and analyze them.

Stay tuned!

--

--