Engine assessment @CARS24 : Sound analysis using Signal Processing & CNN

Published in

CARS24 Data Science Blog

7 min readSep 29, 2023

Motivation

Of the many factors that impact the ‘true’ price of a pre-owned car, one of the most significant ones is the condition of the engine, thus monitoring and diagnosing faults in the engine is critical. However, engine related flaws are generally hidden behind layers of metal and sound based imperfections can be subjective to the observer’s opinion.

To remove this subjectivity while ensuring we understand the condition of the engine, it is necessary to have an unbiased machine which can call out problematic engines before further decisions are made on the car.

There are different types of engine related issues which are crucial while assessing a pre-owned car — abnormal sound from crankshaft, tappet cover, engine vibrations, engine misfiring and blow-by from the engine.

Multiple approaches explored around assessing these issues require sensors to make contact with engine components, like vibrometer, temperature sensor etc. However, because of extreme temperatures and exhaust gases in the engine chamber, these sensors can be quite expensive and are prone to frequent breakdowns. A rather non-intrusive way to capture engine condition is by monitoring the engine sound at idle and at 2000 RPM, we at CARS24 have been collecting this for years now and have engine sound data recorded for millions of cars.

Our attempt with this model was to train a machine that would allow us to directly identify the issue with the engine using only sound recorded during the CARS24’s inspection process, using our evaluator’s mobile phone.

Hypothesis behind approach

Engine sound, when looked at in the frequency domain, shows specific dominating frequencies. These frequencies are produced by the engine’s fundamental frequency and its integral multiples (harmonics). All rotating parts in the engine contribute their own frequencies to the overall sound profile as these components interact and vibrate in time with the engine’s cycles. Thus, our hypothesis was, that if we could establish a frequency pattern under ideal conditions for different types of engines and above mentioned critical imperfections, we can categorize engines using only sound data.

Approach

To ascertain whether the engine sound is following a certain frequency pattern, we went ahead with frequency analysis using fourier transforms. Also, keep in mind that frequency domain data contains dominant frequencies and their harmonics. Deviations or irregularities in the frequency structure indicate potential issues or anomalies in the operation of the engine. For this purpose, we also categorized different engines over displacement of engine, number of pistons, fuel type, turbo engine and also engine manufacturers. This was to ensure we are capturing base frequencies correctly.

Below plots show the frequency magnitude graphs of car with issue in engine crankshaft and car with no issue respectively -

**Crankshaft issue — frequency / magnitude plot**

**Ideal engine — frequency / magnitude plot**

For the purpose of assessing the health of an engine, we have taken a hybrid approach, using signal processing and CNN.

The first method employs signal processing + clustering technique to detect differences in dominant frequencies within engine sound data. Clustering + nearest neighbour approach over an array of dominant frequencies while using the rich dataset of cars refurbed at the mega refurbishment labs of CARS24. This dataset contains all refurbishment work order decisions taken on over 1 lac+ engines and their recorded sounds.
The second approach, on the other hand, incorporates Mel Frequency Cepstral Coefficients (MFCC) analysis, providing a more nuanced understanding of engine sound changes. MFCC spectrogram analysis is especially useful for detecting subtle changes in the spectral content of engine sound. It considers pitch, timbre, and spectral texture, allowing for a more in-depth understanding of the frequency-based patterns of the sound over time.

Architecture

The recorded sound that was obtained from the technician frequently included a variety of background noise, such as car horns, people talking, wind gusts, and more. We chose to break this problem of sound vector selection into two distinct parts in order to effectively address it.

Step 1 — First step is to clean and denoise the original sound signal, to separate and extract the core audio signal from the background noise. Here the balance of threshold for signal of noise ratio is critical to ensure we do not remove actual engine sound signature.

Step 2 — We have the luxury of 7 seconds of engine sound data for each car but for frequency domain analysis only 2 seconds can suffice, so we get to pick the window with minimal external noise (left after denoising)

After this preprocessing, we proceed with frequency domain analysis, as explained in the below architecture diagram -

**Architecture : Engine sound processing & issue detection**

The background noise is minimized while keeping the original sound from the engine intact. STFT (short term fourier transform) is used in this sound stream to compute Power Spectral Density, which is then used to estimate a noise signal.
Using this estimated noise, a Sound to Noise Ratio is calculated for each frequency bin in the sound, and based on the threshold, if there is sound present, it is adequately removed and an inverse STFT is used to return the sound to its time series format.
After removing noise, we evaluate any irregularities that may result from technicians recording behaviour, such as times when the RPM has been changed for 1–2 seconds in between. To tackle this problem, we calculate the zero cross rate in a rolling 2-second time window, then examine which frame has the least divergence from the average zero cross rate and choose that as the best 2 seconds of the sound signal.

Next two sound files show this entire process.

Original Sound

Denoised Sound

Engine Quality Analysis: Below plots show the sound signal transformed and grouped -

Nearest neighbors approach: We used a nearest neighbors algorithm for our classification task, leveraging the information extracted from the dominant frequencies and taking into account the engine type of the car. This algorithm allowed us to compare the engine sound profile to that of other engine types. We calculated the similarity between the engine under consideration and its top ‘x’ nearest neighbors in the dataset from MRL’s refurbishment decisions.

MFCC Spectrogram Analysis: The two-second audio segments are converted into MFCC (Mel Frequency Cepstral Coefficients) spectrograms. It is not humanly possible to mark which part of the spectrogram is because of abnormal sound in the engine, so we took the approach of training a classifier on the spectrogram images, using the previously refurbed car’s data as ground truth. After obtaining the MFCC spectrograms, they are fed into a ResNet-50 model, which was originally designed for image classification tasks (a separate damage detection project to capture imperfections at ORVM and lights) but has been uniquely adapted for this audio analysis application. Because of this novel approach, the ResNet-50 model can process MFCC spectrograms as if they were visual data. The model extracts patterns from the spectrograms, trained against the ground truth generated by car experts, to predict the probabilities of engine issues.

Combination Phase: From our signal processing model, we get refurb actions which were taken on the engine showing similar dominant frequency distribution. And from the RestNet approach, we get probabilities of imperfection in the engine based on spectrograms. Combining these two and optimizing the trade off between precision and recall is done to further fine tune the results.

Performance

Expected output from the current model is to direct the car inspection technician, in case of mis reporting. But for the model analysis perspective, we are capturing a confusion matrix over all cars, where refurbishment decisions are made.

Under an approach to modulate procurement risk, we can move the confidence thresholds for marking engine imperfections, below 2 confusions matrix captures the same -

For further improvements / next steps -

Consume sound wave amplitude and merge it in current approach
Segregate imperfections annotations from the ground data
And most importantly, rejects sounds at the inspection stage and ask to re-record where noise is too high and machine is not able to denoise it e.g. the case below :)

Acknowledgement

The blog has been co-authored by Adarsh (Data Scientist) & Jitesh (Lead Data Scientist) who along with the wider inspection module at CARS24 (Kaushal , Arun in Analytics, Hansneet in Business and Abhishek in Product team) are continuously striving to enhance our capabilities to assess the true condition of a car through efficient inspection approach augmented by data science & deep learning capabilities across image & sound processing.