Understanding “Exponential Moving Averages” in the light of Data Science.

Published in

Analytics Vidhya

6 min readOct 25, 2021

Beyond a shadow of a doubt, when it comes to statistical analysis of any sort, moving averages play a vital role. In simple terms, a moving average gives an intuition about the past by averaging out the experience which is captured in data points. And for that matter, I consider “Mathematics” not just as a subject but as a great toolkit to capture life’s notions in the form of equations that can be quantified and measured. Isn’t that beautiful :).

EMA is not really something new to talk about and there are plenty of articles and content that illustrates the concept of EMA but mainly from a limited perspective, mainly in the context of financial trading view e.g. EMA 12 & EMA 26 [1] or mostly in the context of a Time Series analysis [2]. However, in this article, I would like to discuss the importance of EMA as a basic & general fundamental concept from multiple perspectives in the purview of Data Science.

The article is structured as follows:

EMA in general — Definition and intuitions.
EMA in Time Series.
EMA in Reinforcement Learning.
EMA in Neural Networks/Deep Learning.
Conclusion
Refrences

Definition: In simple terms, a ‘moving average’ is a statistical tool to determine the direction of a trend and, to do that, it aggregates a sub-set of data points usually over a specific time period and divides the total by the number of data points in the subset to yield an average. It is called a moving average because this calculation happens recursively as the data points grow with time.

Exponential Moving Average is a type of Moving Average, which applies more weight to the most recent data points than those which happened in past. In other words, it is like giving more importance to the last experience or memories than to older ones, assuming those are represented by data points.

The following equation depicts the formula to evaluate the Exponential Moving Average :

where α is the smoothing parameter and is between 0 and 1. This is also used to yield one-step forecasts in the simplest terms. For example, the following formula reveals how EMA is used to forecast the price of a stock.

Note: There are other types of moving averages such as Weighted Moving Average, Cumulative Moving Average, etc. however the focus of this article is only EMA. To summarize,

SMA (Simple moving average) gives equal weights to past values,
WMA (Weighted moving average) gives linearly decreasing weights to past values,
EMA (Exponential moving average) gives exponentially decreasing weights to past values.

EMA in Time Series: In time-series methods, EMA is typically used in Exponential Smoothing Methods as shown in the following table. Depending upon the components present in the time series to be analyzed, the respective model can be used to yield forecasts using EMA at its core.

State space equations for each of the models in the ETS framework

Refer “Forecasting: Principles and Practice” by Rob J Hyndman and George Athanasopoulos [3] for thorough understanding.

EMA in Reinforcement learning: In the context of Reinforcement Learning (RL), or to be precise in n-Armed Bandit problems (as full RL has ‘states’ and a ‘policy’ to learn with a long term reward), for non-stationary problems the expected value of a chosen action is given as:

where the step-size parameter α ϵ (0,1] is constant. In above, Qₖ₊₁ is the new estimate while Qₖ is the old estimate and Rₖ is the kth reward for the action.

This results in Qₖ₊₁ being a weighted average of past rewards and the initial estimate Q₁:

The quantity 1- α is less than 1, and thus the weight given to Rᵢ decreases as the number of intervening rewards increases. If 1- α = 0, then all the weight goes on the very last reward, Rₖ, because of the convention that 0⁰ =1.

In fact, If we rearrange the equation above, we get an equation similar to the one used to forecast Price using EMA as Qₖ₊₁ = Qₖ + α(Rₖ - Qₖ).

Further, M. D Awedha et al. have shown the use of EMA in conjuction to Q-learning algorithm (one of the RL algorithms) fora mult-agent case in their paper “Exponential moving average Q-learning algorithm” [4].

EMA in Neural networks/Deep learning: In deep learning, the use of EMA can be seen as the basis for Optimization algorithms for gradient-based stochastic objective functions for example in Gradient descent with momentum, AdaGrad, RMSProp etc. The basic idea is to evaluate an exponentially weighted average of gradients, and then use the result to update the weights instead.

SGD with momentum algorithm Source: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by DeepLearning.AI

For instance, SGD with momentum algorithm as per above, depicts how velocity ‘v’ is evaluated using a exponentially moving average equation as shown on the right side.

By choosing the β value, we control how much weight to give for the last N data points in calculating the average. 1/1-β is a simple way to remember how this weight is applied. With β as 0.9, we will average over last 10 data points while with β as 0.98, we take into account last 50 data points and hence average over the 50 data points.

By averaging over a larger window, the average adapts slowly, when the data changes. This is because a lot of weight is given to previous value and a much smaller weight is given to the new value.

Not just optimization aspect in terms of convergence, but also in terms of compute resources EMA is a highly efficient way to calculate an average. We don’t need much memory or compute power to calculate this average. Especially, in the context of neural networks/deep learning when a lot of compute is needed to train the model, optimization of this sort is not just a value add but a necessity.

Conclusion

To sum up, exponential moving average, when seen from a holistic perspective of data science makes a lot of sense. This is much more evident by its usage in different data science methods and concepts as mentioned above. Thus, deep understanding of this one building block would really help to establish a firm understanding of many data science concepts and deduce the correlations between them.

References:

How to Trade with Python. Trading using Python — Exponential… | by Michael Whittle | Level Up Coding (gitconnected.com)
Time Series From Scratch — Exponentially Weighted Moving Averages (EWMA) Theory and Implementation | by Dario Radečić | Towards Data Science
Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3.
M. D. Awheda and H. M. Schwartz, “Exponential moving average Q-learning algorithm,” 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2013, pp. 31–38, doi: 10.1109/ADPRL.2013.6614986.

Understanding “Exponential Moving Averages” in the light of Data Science.

Conclusion

References:

Written by Vishal Garg