A comparison of different ways to analyse the time series of sunspots

Tracyrenee
Nov 13, 2020 · 6 min read
Image for post
Image for post

When performing an analysis it is important to use a variety of different models to determine the one that will afford the best accuracy. I therefore conducted a time series analysis using models from statsmodels, facebook prophet and sklearn, all libraries that are compatible with Python, the programming language I used to conduct the analysis.

I decided to use a csv file of sunspots that were collected from1749 until 1983 because it had an extensive amount of data and it was apparent when plotting a graph of the sunspots that they occur in a cyclic fashion and it would therefore be easy to detect any trends.

The first thing I did was to load the libraries that I would need to carry out the analysis. I then loaded and read the csv file that I obtained from the following site:- https://machinelearningmastery.com/time-series-datasets-for-machine-learning/

Image for post
Image for post

I then decided to put the sunspot data on a graph because it is always a good idea to visualise time series data:-

Image for post
Image for post

I created a column called “Timestamp” from the “month” column. I also converted the index to the timestamp as well.

Image for post
Image for post

I created training and validation sets from the timestamp. Rather than splitting the train set up by percentages, I split it up based on times:-

Image for post
Image for post

I then plotted the newly created training and validation sets on a graph:-

Image for post
Image for post

Autoregression (AR)

I decided to use a variety of models in the statsmodel library and Autoregression (AR) was just one of them. Autoregression is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. It is a very simple idea that can result in accurate forecasts on a range of time series problems.

I used the AR model, although there is a more recent revision to the model in statsmodels, AutoReg, but it could not be imported into Google Colab, the Jupyter notebook I use:-

Image for post
Image for post

When I used the AR model I plotted the result on the graph. As can be seen in the illustration below, the sunspot activity of the validation set started out in a cyclic fashion, but eventually petered out to form a straight line:-

Image for post
Image for post

The root mean squared (rms) error for Ar was 61.57, which was lower than the errors of the other models in statsmodels that I experimented with:-

Image for post
Image for post

Facebook Prophet

After experimenting analysing time series with statsmodels, I decided to pursue a function in another library: Facebook Prophet. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. At its core, the Prophet procedure is an additive regression model with four main components: a piecewise linear or logistic growth curve trend designed on the Fourier series to provide a malleable model of periodic effects.

The First thing I did was to break down the timestamp into date, month and year to create features_and_targets dataset:-

Image for post
Image for post

I then had to break down the train file to form a training and testing dataset. Because this is a time series analysis, the train file was split by date to create a training and validation dataset:-

Image for post
Image for post

Facebook Prophet only wants to see two columns of data, so I had to drop several columns before I could proceed. The columns I dropped were ‘Month’, ‘Year’, ‘Month_Num’, and ‘Timestamp’:-

Image for post
Image for post

After preparing the dataset, I plotted a graph of both the training and validation datasets:-

Image for post
Image for post

Facebook Prophet wants to have the column names clearly defined, being ‘ds’ and ‘y’. I therefore changed the names of the columns in the training dataset named ‘Datetime’ and ‘Sunspots’, to ‘ds’ and ‘y’ respectively:-

Image for post
Image for post

Facebook Prophet follows the sklearn model API. An instance of the Prophet class is created and then fit() and predict() functions are called. The model is fit on the training dataset and it is predicted on the validation set:-

Image for post
Image for post

Once the data is fitted and predicted on, a graph is created to show the predicted values for the validation dates:-

Image for post
Image for post

A comparative graph is also created to overlay the predicted values over the actual values in the validation dataset.

The mean absolute error of FB Prophet is 49.65:-

Image for post
Image for post

Random Forest

Random Forest is widely used for classification and regression predictive modeling problems with structured datasets. According to the sklearn website, Random Forest is a meta estimator that fits a number of classifying decision trees on various subsamples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Having read that Random Forest can be used as a time series analysis tool, I decided to give this model a try.

I had to initially define my X and y variables. Because the independent values are based on time, the values had to be factorized and then reshaped.

I then split the train dataset for training and validating, deciding to use 15% of the dataset for validation purposes.

With Random Forest I achieved 98.43% accuracy on the training set and 90.23% accuracy on the validation set. In addition, a rms error of 14.32 was achieved:-

Image for post
Image for post

I also decided to graph the results I achieved in the modelling, fitting and prediction process to compare the predicted values against the actual values of the validation set:-

Image for post
Image for post

Conclusion

In conclusion, I believe Random Forest achieved the best error rate, followed by Facebook Prophet. Autoregression(AR) unfortunately performed the worst in this instance. It is important therefore to test every dataset on a variety of models to see which algorithm performs the best.

The computer program for this blog post can be found in its entirety in my personal Github account, with the link being found below:- https://github.com/TracyRenee61/Misc-Predictions/blob/main/TS_Sunspots_Statmodels_%26_Prophet.ipynb

Tracyrenee

Written by

I have over 45 years experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

Python In Plain English

New Python + Programming articles every day.

Tracyrenee

Written by

I have over 45 years experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

Python In Plain English

New Python + Programming articles every day.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store