Bitcoin Price Analysis — Part I: Classical Decomposition Tools

Ishan Nangia
9 min readOct 30, 2019

--

A bitcoin type coin turning into digits

Hello there !! This is a project that me and my classmate Nilakshi Mondal have worked on for our 5th semester Time Series Paper as a part of our Bsc. Statistics degree from Delhi University. The entire project will be explained in two posts. This one will be on the tools used to understand trend, seasonality and cyclic nature. The next post will be on forecasting where we used Simple Exponential Smoothing, Holt’s Linear tren and ARIMA. We would love to know about any mistakes we made or any improvements that can be suggested since this is our first time working on a pure univariate time series dataset. Also the github repository for this project can be found here.

The Dataset

We used the Cryptocompare API for obtaining the dataset. We won’t go into the details regarding the API but it was fairly easy to use and the code can be found at the repository. The dataset we obtained contained hourly data for bitcoin from 2010–07 to 2019–09. We further truncated the data as we wanted to only work on the 2019 data.

A look at the Hourly Bitcoin Data

We also split the data into train and test sets based on a 0.90 train-test-split ratio. The test data would be hidden from us till forecasting.

We will be concerned only with the column ‘open’ which tells us the ‘opening price’ of the bitcoin for every hour.

Graph for open column over time

As we can clearly see from the data, it is not stationary, has a clear trend (or rather two trends: One from Jan till July and the other July onwards), might or might not have seasonality but has a lot of random fluctuations.

We shall start by analyzing the trend, move onto the seasonality and then the cyclic component.

Trend Analysis

The trend is indicative of the long term movement of the data. Usually a general increasing or decreasing trend can be found in the data which allows us to decide whether the data will be increasing with time or decreasing in the long term. Thus it helps in long term planning along with giving a rough estimate of future values.

For analyzing the trend of the data the moving average method is usually preferred. We have implemented from scratch a couple of trend curves and will be fitting all of them and comparing them on the basis of their MSE or Mean Squared Error.

https://www.freecodecamp.org/news/machine-learning-mean-squared-error-regression-line-c7dde9a26b93/

We fitted the following curves:

Different Trend Curves
  1. Straight Line
  2. Exponential
  3. Parabolic
  4. Second Degree Curve Fitted to Logarithms
  5. Logistic
  6. Moving Average
  7. Modified Exponential
  8. Gompertz

We have attached the resources one could consult to understand the curves and their fitting as hyperlinks above. But they won’t exactly be the same as what what we have coded. Functions and models differ in their representation from book to book and thus there will be slight differences almost always. Please message or comment if you want to get the exact form of the model and the theoretical solution used.

We have the curves fitted to the data as below:

Straight Line and Exponential fitted to Opening Price
Parabolic and 2nd Degree Curve fitted to Logarithm with Opening Price
Logistic and Moving Average with Opening Price. Note that moving average of extent(order) 169(24*7+1) has been fitted here. We will be selecting the appropriate order for moving average very soon.
Modified Exponential and Gompertz fitted to Opening Price

Upon visual examination the moving average fits best to the data. However, to determine the best trend we perform the following two steps:

  1. Determine an appropriate moving average extend/order. One that doesn’t overfit or underfit the data.
  2. Calculate MSE(Mean Squared Error) for all the trends to see which one fits best.

To determine the order of the moving average, we plot it with the opening price with its order iterating over the range 51 to 1500 with intervals of 100. Meaning we compute the moving average for extents: 51, 151, 251…..1351, 1451 and then select the best one visually. We want one which neither underfits nor overfits the data.

The Moving average is highly merged with the actual data and doesn’t really represent a trend
The Moving average curve relies less on the individual fluctuations and is more representative of the general trend
These three are very similar but note that as we keep increasing the order, we lose out on more and more values at the beginning and end
The curves are much smoother now but have also a lost a lot of values at the start and end
This is the last set of curves that we have graphed and they have a lot of values that have been lost towards the start and end. They have also gotten a little jagged compared to the previous set as they are extending over a very long period and thus they are the averages of very different values.

On examining the graphs, we note the following:

  • 51-MA to 351-MA more or less overfit the data and catch even very small fluctuations
  • 451-MA to 751-MA are smoother and neither overfit nor underfit
  • 851-MA onwards we have loss of quite a lot of values at the start and end of the series. The trend line keeps getting straighter as we increase the order. 1051-MA onwards we have a lot of values lost and thus won’t be considering them.

Thus, just on the basis of visual examination, we see the 651-MA to be a really good fit to the data without overfitting it and without losing too many values.

Final Moving Average Model

Now to choose between the moving average and the other trend curves and to select the most appropriate we could either visually examine or select the one with the least MSE(Mean Squared Error). Visually, it is clear that moving average fits the data the best. We calculate MSE only on the train dataset and we get the following the result:

MSE for the different curves

Thus, we get the two best trends for our training data as:

  1. 651-MA trend curve
  2. Parabolic trend curve
Parabolic trend curve for our data

Seasonality

We will be implementing some seasonality techniques to see if there is hourly seasonality in our data. Then we will be using seasonal_decompose from statsmodels to study seasonality further and decompose our data.

Below we have shown the graphs obtained on implementing each seasonality method. The x-axis has the hours. We are trying to study if each hour shows some seasonality or not, i.e, there will be 24 distinct seasonality units. To understand this further we are trying to understand the values of are dependent to a certain extent on the hour of the day. Maybe the price increases in the morning and then goes down by midnight or maybe around lunch the demand decreases etc.

The y-axis has the seasonal indices with 100 being no seasonality at all or the overall average/center.

  1. Ratio to Moving Averages
Steps to calculate seasonal indices using ratio-to-ma
There doesn’t seem to be a lot of seasonality as there is a maximum of around 0.15% increase and about 0.15% decrease

2. Link Relative Method

Again here we see the maximum increase is to about 0.09% and decrease of about 0.06%. Please note that here also we have the same indices as above on the y-axis. The data is centered around 100 not 0 as can be noted from the +1e2 sign above the graph.

3. Simple Averages

Steps to calculate seasonal indices via Method of Simple Averages
This is the most naive method of all and again shows an increase of max 0.25% and max decrease of appx 0.15%

4. Ratio to Trend

Implementing the method of ratio-to-trend
This is uses the parabolic curve as a trend estimate and again shows negligible seasonality

We see for all the above that hourly seasonality doesn’t exist. It is very negligible. The seasonality peaks max to about 0.25% but that’s too less. It’s not even 1%.

We now try to decompose the data using the python function “seasonal_decompose” in statsmodels that allows us to specify the type of model(additive/multiplicative) along with the frequency of the data we want to compute seasonality for. This method implements classical decomposition though ideally X11, SEATS or STL decomposition is preferred.

This is an instance of the classical decomposition with the model being multiplicative

We make the following observations :

  1. There is no seasonality or cyclic component in our data. Thus, it is only the random component and trend which make up our data. (We say no cyclic component as we haven’t used data extending beyond an year so we can’t say if there is a cyclic component or not. Cyclic component will be discussed below soon enough)
  2. As we see below, the variance of the residual component keeps on increasing with time. and then starts decreasing around July onwards. This is very similar to how the trend behaved.
  3. We also tried out an additive model and it gave almost the same results as multiplicative model.
  4. See the freq = 24 * 7 argument which means that we consider each week to be a season and thus all weeks to have the same seasonality. That means we have 24*7(hours*days) number of individual units which are assumed to have individual seasonality.
  5. Note that seasonality increases/decreases by 1% which is too less. Thus the data can be said to not have significant seasonality just as our above methods showed.
The variance of the residual component increases with time

Lastly, the remainder variable has a heteroskedastic nature. The variance of the remainder increases with time and then decreases. This remainder is made up of the cyclic element along with the error/random component. Thus we must check out the cyclic nature of the data. But since we have only data for under an year, we can’t explore cyclic variations. Thus any conclusions we draw right now should be checked with more past data.

Cyclic Nature

As we aren’t using data extending beyond an year, we can’t really study the cyclic nature of the data. But we still implement the method keeping in mind that we have hourly data for about 7 months.

We will be using a very crude but pretty commonly used technique : Residual analysis. This involves dividing the actual data by the trend values and seasonal indices and multiplying by 100**2 to get the cyclic and residual component. Then we apply a moving average to remove the random component by averaging it out to get the final cyclic component.

Cyclic component using Residual Analysis

Here we see an analysis very similar to when we used the “seasonal_decompose” function python. The fluctuations increase with time and with months. This is true as our data has a lot of variation towards July and August.

This almost looks like it has some seasonality where there are two peaks followed by a huge fall. This cycle keeps repeating in our data.

Now, to explore this component a little more, consider the data starting from 2017 January till the end of our training dataset.

New dataset to check cyclic nature

To calculate cyclic component for this data we need:

  1. Moving average : Trend
  2. Ratio to moving average : Seasonality
  3. Residual analysis : Cyclic component
Opening prices and 651-MA fitted to new data. Overfits the data a little but shows us clearly that there is an increasing trend followed by a downward trend which increases again and starts declining at the very end.
Seasonality for the new data. Still doesn’t exist.
Cyclic nature of the data

From the given graph, we concluded that it is difficult to see if a cyclic component exists as there is too much of fluctuation of prices. There was a period of really high prices around Nov-Dec 2018 followed by a huge dip around Feb 2018. This is probably one of the only main defining period. After this period we see fluctuations decreasing and then increasing a little around Oct-Dec 2019 where we see a major fall in the prices. Fluctuations then reduced for some time but started increasing again with time.

There is a strong random component related with our data which keeps changing its variance. As the bitcoin, and cryptocurrencies in general, had a time when there was too much of hype around it, naturally there were periods of strong inflation of prices due to market sentiment and hype. But as the hype fell and the bubble burst, so did the prices of the bitcoin which explains the dips followed by huge peaks.

Also a lot of governments have banned trade of cryptocurrencies. This obviously affects market sentiment a lot. There will probably come a time when this fluctuation will settle around a price and then keep increasing or decreasing as cryptocurrencies get more acceptance or rejection.

That’s it for this article. You can check out the code here. The 2nd article will be uploaded soon with all the forecasting !

--

--