Sensor Data Filtering Demystified: Finding the Right Technique for Your Needs

Published in

Amiral Technologies

6 min readJun 27, 2024

Data smoothing, when applied to time series, can introduce artifacts that inject unwanted noise into the data. These artifacts may be significant in either temporal or spectral domain, depending on the smoothing technique employed. Data scientists should be mindful of these issues before applying such cleaning preprocessing. They typically employ a Moving Average (MA) technique to smooth the signal, which significantly affects the frequency spectrum while keeping the temporal sharpness of the signal. Conversely, using a traditional FIR (Finite Impulse Response) or IIR (Infinite Impulse Response) filters to preserve a specific frequency band of the signal may have a significant impact on the time-domain representation of the signal.

This blog paper aims to highlight these artifacts to facilitate the choice of the appropriate filter for the intended application: achieving precision in both the frequency and time domains simultaneously is not reachable because of the Gabor limit [1].

To illustrate this, the study will be focused on comparing MA smoothing to IIR and FIR filters. To do so, let’s consider a signal composed of three (overlapped) frequency components:

Typically, in such a case, we would aim to filter out the high frequency component to inject the denoised signal into a predictive model, for instance.

Moving Average smoothing

To achieve this, one may consider using the MA smoothing technique, which is a special case of a FIR filter. To demonstrate this, let’s examine the general expression of the filter.

With y[n] being the output of the filter, x[n] the input signal, b and a the filter coefficients. The first term only depends on the past of the signal whereas the second term relies on the past of the filtered signal itself, leading to recursivity. The MA filter may be written as:

By association, we deduce that the MA filter is a FIR filter of N order with b = [1/N, 1/N, 1/N, 1/N, 1/N] and null a’s.

Applying the filter to the signal and computing the corresponding filter is done using scipy:

import numpy as np
from scipy import signal

raw_signal = ...
N = 5 # order of the MA filter
fs = 5000 # Hz, sampling frequency
b_k = 1/N * np.ones(N)
y_lfilter = signal.lfilter(b=b_k, a=1, x=raw_signal)

The filtered version of the signal by applying MA filter is given below. We clearly see that the MA filter smoothes the time signal at the cost of degrading its frequency component : the component at 1000Hz, containing pertinent information, has been degraded, and the high-frequency portion is insufficiently filtered out.

From there, we can easily obtain both the impulse response and the frequency response of the filter using scipy:

imp = signal.unit_impulse(200, 'mid') ## [0,0,0,0,...,0,1,0,0...,0]
impulse_response = signal.lfilter(b=b_k,a=1, x=imp)
freq_vec, freq_response = signal.freqz(b=b_k, a=1, fs=fs)

The filter response is plotted below, where we can understand the name “boxcar filter” sometimes attributed to the MA filter. The frequency response takes the form of a sinc function, as expected. The presence of “bumps” of the sinc response explains the undesired artifacts observed in the filtered spectrum shown above.

Of course, in order to preserve the 1000Hz component, the MA filter may be tuned, but its effectiveness is quickly limited. Classical FIR or IIR filtering techniques should be employed in this scenario, as we will explore shortly.

FIR Filters

To illustrate the FIR filter, the firwin function from scipy will be used. Unlike the MA filter, we define a perfect filter pattern in the frequency domain (low-pass, high-pass, or band-pass), which is a square function, and deduce its response in the time domain, which is a sinc signal. The filter coefficients are analytical and depend on (1) the cutoff frequency and (2) the number of points defining the frequency pattern of the filter. The higher the number of points, the more accurate the filter performance will be.

fc  = 1250
order =101
b_firwin = signal.firwin(order, # Number of coefficient bk. It can be seen as the number of frequency coeficients to define the filter pattern
                        cutoff=fc,
                        pass_zero="lowpass",
                        fs=fs,
                        window="hamming") 
f, H_firwin = signal.freqz(b_firwin, a=1, fs=fs) # frequency response

# Apply the filter to the signal
y_lfilter = signal.lfilter(b_firwin, 1, raw_signal)

We can clearly observe the sinc function in the time domain now. The delay is significant and can be compensated for in non-streaming applications.

In this case, we obtain a denoised signal that keeps the two low and medium frequency components, while removing the high-frequency component, which is identified as noise. Here, the time delay has been compensated:

When we focus on the first 500 samples of the filtered signal, we observe that the component at 1000Hz has been preserved:

IIR Filters

The initial step in designing the IIR filter involves expressing the filter’s transfer function. In our case, we aim for a low-pass filter, such as the Butterworth filter.

To obtain the filter coefficients, we approximate the denominator with polynomials of order n. Scipy has two functions that compute those filters: signal.butter and signal.iirfilter. The higher the order of the filter, the steeper the cutoff slope, but the more artifacts are created (and the more likely instability becomes). For IIR filters, scipy strongly recommends using second-order filters cascaded:

iir_order = 10 
# Get the coefficents
sos = signal.butter(iir_order, fc, 'low', fs = fs, output="sos")
# Get the time impulse response
response = signal.sosfilt(sos, imp)
# Get the frequency response
w, h = signal.sosfreqz(sos, fs=fs)

Since the number of coefficients is lower than previously, the time delay is also reduced.

Visually, the filtered signal is similar to the FIR result.

Conclusion

To conclude, below is a figure summarizing the filter behavior in both the temporal and frequency domains.

The choice may be a relevant trade-off between efficiency and stability:

MA (Moving Average): useful if temporal precision is desired and frequency content is not critical (e.g., smoothing model scores along time). Optimal in order to preserve the temporal sharpness of the signal (e.g., square signals);
FIR (Finite Impulse Response): useful if the preservation of frequency content is required, a stable response is desired and a high order can be afforded (significant memory capacity, significant delay);
IIR (Infinite Impulse Response): useful if the respect of frequency content is required and if real-time processing is important. Can be risky due to recursivity, and may alter the signal due to non-linear phase (e.g., Chebyshev or elliptic filters).

At Amiral Technologies, we specialize in failure prediction and health monitoring for industrial equipment. Our expertise is included in DiagFit software, which integrates our algorithmic and industrial know-how.
For instance, in DiagFit, the trade-off mentioned above (between efficiency and stability) is automatically determined based on factors such as criticality of time components, frequency components, and real-time processing requirements.

References

[1] Gabor, D (1946). Theory of communication. Journal of the Institute of Electrical Engineering 93, 429–457.

Sensor Data Filtering Demystified: Finding the Right Technique for Your Needs

Written by Thibaut Le Magueresse