Time series filtering algorithms: a brief overview

Published in

Deelvin Machine Learning

9 min readMay 24, 2022

In our recent work on developing an algorithm for person segmentation in video, we faced the following problem. When visualizing the result of the segmentation algorithm (signal), we detected pixel jitter at the border of the segmentation mask (noise). It appeared that the segmentation algorithm changed its predictions from frame to frame on the boundary pixels of the segmentation mask, which worsened the visual perception of the result. To solve the jitter problem, we decided to apply an algorithm that would reduce noise in the signal. In this post, we share the techniques we’ve learned while solving this problem.

By signal we understand a sequence of points (generally n-dimensional). The signal can have different nature: it can be data on temperature in a city for a period of time (where each point is a floating-point number), a video stream (where each point is a picture), or the video human pose estimation computer vision problem (where such points are key points (joints) detected on each frame).

The causes of noise may also be different. Among them are inaccuracies in conducting experiments, limitations in sensor resolution, or unstable numerical calculations. As a result, noise can negatively affect the operation of systems that receive this signal or worsen the user’s experience. Consequently, different approaches have been developed to address these issues.

There are two approaches to noise reduction: filtering algorithms and smoothing algorithms. In filtering algorithms, signal points are fed sequentially, therefore only the current and the previous points are used to get rid of noise at the current point. Smoothing algorithms assume that the entire signal has been received, and all signal points, both previous and subsequent, are used to get rid of noise at the current point. For our task, it was important for the system to work in real time, so we settled on filtering algorithms. We considered moving average (MA) filtering, exponential moving average (EMA) filtering and one Euro filtering. An overview of these algorithms is given below.

The description of each algorithm is supplemented with visualization of the results of its work on the generated data. For this purpose, we took 450 points on the interval [0, 15], calculated a sinusoid on them (original data) and added Gaussian noise to it (noisy data). For each filtering algorithm, we filter the noisy data and visualize the resulting signal along with the original and the noisy sine wave.

Filter implementation and visualization notebook are available on GitHub (link).

Moving average filtering

One of the easiest ways to filter out noise is to calculate the average of several previous values. By the Central Limit Theorem and reasonable assumptions, averaging enough successive values of a noisy signal should produce a better estimate of the true one (link). Formula (1) demonstrates the calculation of moving average (MA), where Xₙ is the n-th member of the time series (noisy signal), Xᵢ-hat is the estimate of the i-th member of the time series (filtered signal), w is the window size.

It is worth noting that filtering inherently introduces time latency — commonly called lag — which reduces system responsiveness. Lag significantly affects the subjective perception of the signal. The balance between the degree of noise and lag is achieved by adjusting the algorithm parameters.

This algorithm has only one adjustable parameter — the window size (w). By increasing the window size, we reduce the noise, but at the same time increase the lag.

Fig. 1, Fig. 2, Fig. 3 demonstrate the operation of the moving average filter with different window sizes. The graph in Fig. 1 demonstrates that with w equal to 10, the data is still very noisy, but there is no lag. With w equal to 50, almost all noise is filtered out, but lag appears (Fig. 2). With w equal to 200, the filtered data begins to degenerate into a constant (Fig. 3).

Exponential moving average filtering (low-pass filter)

Unlike the moving average filter algorithm, where all elements have the same weight, that is the “old” examples contribute just as much as the “new” ones, the exponential moving average algorithm allows one to assign different weights to different examples. Therefore, the old values have a smaller contribution than the new ones.

Formula (2) demonstrates the calculation of the exponential moving average, where Xᵢ-hat is the estimate of the i-th member of the time series (filtered signal), Xᵢ is the i-th member of the time series (noisy signal), and α is the smoothing factor. The first term of formula (2) is the contribution of the current value of the input data, and the second term adds inertia from the previous values. Because the contribution of older values decreases exponentially, a low-pass filter has less lag than a moving average filter with a high w value.

In this algorithm, only the smoothing factor (α) parameter is configured, which takes values in the interval (0, 1). By decreasing α, we reduce the noise, but at the same time we increase the lag.

Fig. 4, Fig. 5, Fig. 6 demonstrate the operation of the exponential moving average filter with different values of parameter α. The graph in Fig. 4 demonstrates that with α equal to 0.1, the data is very noisy, but there is no lag. With α equal to 0.05, the noise is reduced better, but lag appears (Fig. 5). At α equal to 0.005, the filtered data begins to degenerate into a constant (Fig. 6).

Figure 4. Exponential moving average (alpha is 0.1)

Figure 5. Exponential moving average (alpha is 0.05)

Figure 6. Exponential moving average (alpha is 0.005)

One Euro filtering

During the filter operation, some changes may occur in the signal. For example, the rate of data arrival (the number of points arriving per second) may change, or the rate of change in the data (the difference between the current point and the previous point) may change. The MA and EMA filters cannot react to such changes because the parameters that control noise removal and lag are selected and fixed before applying the filter. They do not have a mechanism that would allow these parameters to be changed during operation. One Euro filter is designed to solve this problem.

One Euro filter is a low-pass filter with an adaptive alpha value that depends on the rate of change in the data. The authors of the article (link) use the intuition that at low-speed people are very sensitive to noise (jitter) rather than lag, but as speed increases people become more sensitive to lag rather than noise. Therefore, in this algorithm, alpha increases as the speed increases to combat noise at low speeds and reduce lag at high speeds.

Formula (6) repeats formula (2) for the exponential moving average, but the smoothing factor α is dynamically calculated using formula (5). In formula (5) Tₑ — data sampling period (calculated from data timestamps), which allows one to work with data with a variable frequency, and f𝒸 — cutoff frequency. In turn, f𝒸 is calculated by formula (4), where f𝒸ₘᵢₙ — intercept (responsible for filtering at low speed), β — slope (responsible for reducing delay at high speeds), only these two parameters are configured in the one Euro filter algorithm. Formula (4) also uses the derivative (velocity) of the signal, which is calculated using formula (3). In formula (3) Xᵢ is the current element of the time series (noisy signal), Xᵢ₋₁ is the previous element of the time series. However, in order to avoid outliers in the values of the derivative, an exponential moving average filter with a fixed parameter is applied to it, therefore, in formula (4), the estimate of the derivative is already indicated, and not the derivative itself.

As mentioned above, this algorithm has 2 adjustable parameters — f𝒸ₘᵢₙ and β. The authors of the article propose the following two-stage procedure for their selection:

Selection of f𝒸ₘᵢₙ. First, β is set to 0, and f𝒸ₘᵢₙ is set to 1. Then you need to check the filter performance at low speeds and adjust f𝒸ₘᵢₙ parameter to minimize noise and maintain an acceptable lag. If there is too much noise, then it is necessary to reduce f𝒸ₘᵢₙ. After selecting an acceptable value, we fix f𝒸ₘᵢₙ.
Selection of β. With f𝒸ₘᵢₙ fixed, β parameter is adjusted on the data at high speed. We increase β to minimize lag to an acceptable level and fix it.

Fig. 7, Fig. 8, Fig. 9 demonstrate operation of one Euro filter with different values of parameters f𝒸ₘᵢₙ and β. The graph in Fig. 7 demonstrates that with f𝒸ₘᵢₙ equal to 1.0, β equal to 0.0, the data is very noisy, but there is no lag. With f𝒸ₘᵢₙ equal to 0.3, β equal to 0.0, noise was removed better, but lag appeared (Fig. 8). With f𝒸ₘᵢₙ equal to 0.3, β equal to 0.007, lag slightly decreased (Fig. 9).

Figure 7. One Euro filter (f_cmin is 1.0, beta is 0.0)

Figure 8. One Euro filter (f_cmin is 0.3, beta is 0.0)

Figure 9. One Euro filter (f_cmin is 0.3, beta is 0.07)

It was mentioned above that one Euro filter can work with variable frequency data. To demonstrate this, we generated new data in which the frequency of points decreases over time, then MA, EMA and one Euro algorithms were applied to this data. Fig. 10 and Fig. 11 demonstrate that for MA and EMA algorithms, with a decrease in the frequency of arrival of points, lag increases. At the same time, one Euro algorithm maintains a constant lag size throughout the signal (Fig. 12).

Figure 11. Exponential moving average (alpha is 0.05)

Figure 12. One Euro filter (f_cmin is 0.3, beta is 0.0)

Notes on other algorithms

Below are some filtering algorithms that are not covered in detail in this article, but are worth mentioning:

Double exponential moving average (Holt’s linear trend method). This is an extension of the exponential moving average filter to work with data with a trend. You can learn more about this method here (link).
Triple exponential moving average (Holt-Winters’ seasonal method). This is an extension of double exponential moving average for working with trend and seasonal data. You can learn more about this method here (link).
Kalman filter. A very popular filtering algorithm, which requires building a mathematical model of the signal. You can learn more about this method here (link).

Interesting links on smoothing algorithms

In this blog post, we do not consider smoothing algorithms, however, I would like to share some links that I found interesting:

Article comparing 3 smoothing algorithms (Whittaker, Fourier, Linear Fit) for the Land Cover Classification task (link).
A series of articles by one author describing the following algorithms: Gaussian, Butterworth filter (link), wiener filter and smoothing splines (link), Kalman Filter (link), Particle Filter (link).

Conclusion

We have considered several time series filtering algorithms. We described the principles of their operation and setting parameters and visualized their work. Because this is only a review article, we did not compare the quality of the algorithms and their speed. Such comparison should be based on a concrete task which would involve the choice of an algorithm and its parameters.

Read more articles on Machine Learning from our Deelvin Machine Learning Blog.