Time series: what tools are available in Python to analyse them?

First part: visualisation and statistics

Stéphanie Crêteur
Geek Culture
5 min readNov 25, 2022

--

In this article, which I will divide into two parts, I would like to focus on time series, which is the chronological evolution of a quantity, regularly spaced in time (hourly, monthly, yearly…). They can represent the evolution of the temperature over time, the evolution of the stock market, the monitoring of the heart rate, etc. In any data analyst’s career, processing the information contained in time series is a crucial and recurring part of their work. However, these can sometimes be confusing and difficult to read. So how do you proceed with this data?

Faced with this challenge, I decided to turn to the literature written on this topic, so this article will be sprinkled with many references that helped me to better understand this subject.

Photo by Maxim Hopman on Unsplash

Since its creation, Python has managed to gather around it a growing community of scientists and analysts. That can be attributed to its easiness to code, open-source frameworks and libraries that have been developed which allows Python ‘to perform statistical computing, machine learning tasks, develop websites, and much more’ (Peixeiro, 2022, p. 1). Indeed, several of those libraries greatly facilitate the work of any researcher by automating complex procedures and offering many tools to visualise and calculate data. This is, of course, true as well for time series.

To help you understand the process better and to make it as concrete as possible, I propose working with concrete figures taken without context. I chose to leave these values deliberately abstract to be as universal as possible, although obviously knowledge of the sector in which they were taken would be part of a thorough and complete analysis and would likely allow me to conclude more quickly which method would be most relevant and effective. As Chatfield, in his book The Analysis of time series: an introduction, confirms ‘The context of a given problem is crucial in time-series analysis’ (Chatfield, 2004, p. 11).

1. Visualisation

In any analysis, the visualisation of data should be the first step. This is confirmed by many professionals: ‘Making plot and static or interactive visualizations is one of the most important tasks in data analysis.’(McKinney, 2018, p. 219) Janert in his book Data analysis with open source tools explains: ‘Looking at data, you will notice things — the way data points are distributed, or the manner in which one quantity varies with another, or the large number of outliers, or the total absence of them.’(Janert, 2011, p. 1)

This is revealed to be of even more importance in time series as Chatfield confirms: Anyone who tries to analyse a times series without plotting it first is asking for trouble. A graph will not only show up trend and seasonal variation, but will also reveal any’ wild’ observation or outliers that do not appear to be consistent with the rest of the data.’ (Chatfield, 2004, p. 6)

If I plot the data into a scatter plot (figure 1), it seems that no specific trend is visible at first glance.

However, if I use a line graph instead as shown in the graph below, I get a totally different picture. The line graph, by showing the change between two points, lets us glimpse at what seems to be a ‘regularly recurring feature’ (Janert, 2011, p. 80), looking almost like a sinusoidal function. This, combined with the amount of surrounding noise, makes the scatter plot confusing. The line graph can be useful when we need not only to see a general trend (which in this case is too confusing) but as well to distinguish the local trend between a group of points (‘Line Graphs and Scatter Plots,’ 2005). Here it allows us to see the repetitive pattern. Choosing the visualisation correctly is therefore an important step and different criteria have to be taken into account, such as ‘the amount of noise and outliers in the data’ (Wang et al., 2018) or if the data is discrete or continuous.

The visualisation gives us the first hint at one of the main components of our function: the presence of seasonality. As for the trend (another important component of the time series) which is defined as a ‘long-term change in the mean level’ (Chatfield, 2004, p. 12), it doesn’t seem to have one. However, we will need to dive a bit more into our function to be able to make that clear.

2. Statistics

Halswanter says in his book that ‘[s]tatistics provides us with the tools to extract the maximum amount of knowledge from a given data.’ (Haslwanter, 2021, p. 123) Summary statistics can therefore be of great help in our understanding of the function. It is, however, important to take this information with caution as Chatfiled warns to be very careful with summary statistics of time series as they can be misleading (Chatfield, 2004, p. 11).

The data consists of 400 points, with the x-axis going from -20 to 20 and the y-axis from -0.511 to 0.486. It has a mean value of -0.019. Each value on the x-axis has only one corresponding value on the y-axis and they are equally spaced. Using the describe() function which returns the statistical summary of our dataframe, I get the following results:

The mean (-0,019) and the median (-0,010) are close together which implies that our distribution is symmetrical and might be normally distributed. However, a quick glance at a histogram plot tells us differently: the highest number of values are found at the extremities but with a general homogeneous distribution in each of the bins. We therefore have data that are symmetrically distributed around zero and oscillating between -0.5 and 0.5.

However, thanks to the line graph, it is obvious that the data seems noisy. To further understand the graph, I would need to get rid of this feature by filtering it. And that's what I will be doing in the next article. Be sure to follow to get the next parts where we will focus on filtering, analysis and prediction!

Thank you for reading!

--

--

Stéphanie Crêteur
Geek Culture

Python | Data analysis lover. Learning about AI and Natural Language Processing.