Bitcoin Price Prediction with Random Forest and Technical Indicators

Robs
Analytics Vidhya
Published in
3 min readDec 28, 2020

After the all-time high of Bitcoin with a market capitalization of more than 500 Billion dollars, many people, including me want to know if the value of bitcoin will keep increasing or if it is a bubble like it was in 2017. That’s my motivation why I write this post.

First I explain how I add features to the source dataset, which consists of price history and trading volume. The with technical indicators enriched dataset is then used to train a random forest classification model, which predicts whether the value of bitcoin increases or decreases in the future.

All used libraries are listed in the pipfile of the corresponding GitHub repository. The repository contains also an Jupyter Notebook with the complete code.

Create Dataset

Historic data of prices and volume can be obtained from quandl, which provides a python library. It’s pretty straightforward as it loads the data directly into a Pandas dataframe.

We could already try to use this dataframe to train a classification model, but I decided to add more features what usually makes a model more accurate.

I found a library called ta, which creates many technical indicators, like moving averages or bollinger bands (and many more). It is originally designed for analyzing stock data, but should also work for Bitcoin. The following code adds more than 80 features to the dataset. Check out the notebook on Github to see the output.

The next step is to create the classes we want to predict. In order to not make the problem too complex, I decided to only predict whether the value of a bitcoin will rise or fall looking 7 days into the future. Predicting binary classes is much easier than predicting concrete prices.

Train the classification model

Next, we need to divide the dataset into test and training data. The First 2000 days for training and the remaining 491 days for testing.

Now comes the fun part, creating the random forest classifier and fitting it with the training data. Other classification models would also work, I just choose random forest, because I personally like it.

Evaluate the model

0.5804480651731161

The prediction is in 58% of the cases correct. Testing it with training dataset results in an accuracy of 72% what is not surprising, since such models tend to overfit.

Interpretation

The following code snippet visualizes the result. The red and green dots indicate whether the prediction was correct at the given time or not.

Bitcoin Price Prediction with Random Forest Evaluation

The random forest model correctly forecasted the decline in march 2020, which was at the beginning of the corona crisis. However, the rise at the end of 2020 was not predicted correctly.

The model relies entirely on technical data, derived from price history, which many believe is mostly random. Therefore, I’m satisfied with the result, but I would expect significant improvements if fundamentals would also be included. Examples could be Google search volume for the term Bitcoin or analysis of the blockchain. Especially the blockchain could be very valuable as it contains all transactions since the beginning in 2009.

--

--

Robs
Analytics Vidhya

I am a software developer interested in data engineering and data science