How I used ML to predict Bitcoin Prices

Shameless plugin: We are a machine learning data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick.

We have used the time series model ARIMA and trained a neural network model RNN for predicting the bitcoin prices for future based on previous values and trends.Using the ARIMA model which was trained on around 70 data points, an average accuracy of 75–80 % was achieved and using the RNN model an accuracy of almost 95% was achieved. This project was mainly built as Bitcoin is longest running and most well known cryptocurrency and is said to have a great future. Through this project what I wanted to see is if I could quickly train a deep learning model or use the standard time series models to predict Bitcoin prices and its future trends.

Why Bitcoin?

Bitcoin is more accessible, with more exchanges, more merchants, more software and more hardware that support it. Bitcoin has two things going for it that help significantly in this respect — Stability and entrepreneurship. It has the most entrepreneurs creating companies around it with a lot of intellect, dedication and creativity going toward making it more useful. As Bitcoin evolves, we can expect Bitcoin to grow in unexpected ways as new utility is found. Bitcoin owners can expect that its usefulness will only increase over time, hence creating a huge opportunity for investment and make huge profits.

But when to invest and how much to invest is questionable and hence we have built this model to help predict the best time to invest.

Just like most currencies, the price of Bitcoin changes every day. The only difference is that the price of Bitcoin changes on a much greater scale than local currencies.

ML model

The model built gives prediction for bitcoin prices on any date given in the standard Unix format. These predictions could be used as the foundation of a bitcoin trading strategy. The people that bought the stocks when they were at high prices, lost most of their money. This is why it is important not to invest more money than you can afford to lose. Like stock market analysis this too can be used by investors to judge the best time to make investments in order to get best results. Even though there are multiple other factors which can affect the bitcoin value like the supply and demand, other cryptocurrencies and many other this can be used as a basic model and the rest factors can be manually studied as most of these factors are unpredictable. It can be used to get a fair idea of the prices and where the investments can be made. Bitcoin is still young and many sources says its here to stay. Hence it could be a good idea to invest into the same.

We have used the jupyter notebook, in Anaconda 3. The 3 .ipynb files can be downloaded and ran as it is to see the results.

Dataset

Before we build the model, we need to obtain some data for it. The dataset used is the minute by minute Bitcoin prices for the last few years. Over this timescale, noise could overwhelm the signal, so we’ll opt for daily prices. Dataset name — bitstampUSD_1-min_data_2012–01–01_to_2017–01–08.csv, CSV file for select bitcoin exchanges for the time period of Jan 2012 to Jan 2018 with 3,161,057 instances approximately, with minute to minute updates of OHLC (Open, High, Low, Close), Volume in BTC and indicated currency, and weighted bitcoin price.

To make these predictions, first take one will have to familiarize themselves with a machine learning techniques ARMA, ARIMA, Recurrent Neural Network (RNN) with prediction and time series analysis as our main objectives. For the RNN model, having Keras library installed in the system is necessary.

An ARIMA model is a class of statistical models for analyzing and forecasting time series data.

It explicitly caters to a suite of standard structures in time series data, and as such provides a simple yet powerful method for making skillful time series forecasts.

Unlike regression predictive modeling, time series also adds the complexity of a sequence dependence among the input variables.

A powerful type of neural network designed to handle sequence dependence is called RNN . The Long Short-Term Memory network or LSTM network is a type of recurrent neural network used in deep learning because very large architectures can be successfully trained.The Long Short-Term Memory network, or LSTM network, is trained using Backpropagation Through Time and overcomes the vanishing gradient problem.

As such, it can be used to create large recurrent networks that in turn can be used to address difficult sequence problems in machine learning and achieve state-of-the-art results.

Code: Github

For any further needs, please send me your details at devika.mishra@dataturks.com and I will provide you with the required resources.

The timestamp in the data was converted to standard UNIX timestamps and for ARIMA the data was grouped by months by taking the mean values and for RNN the data was grouped by the days again taking mean value for each day.

Results:

ARIMA :

RNN:

When predictions were done on a set or around 100 dates, the number of predictions close to the actual value were as shown in the table. This table can be read as around 66 predictions made by the ARIMA model were as close as 90–100% of the actual value.

It was observed that the ARMA model failed to give a good prediction where as the ARIMA model which was trained on the basis of monthly data has a quite accurate prediction.

As the amount of data was huge the neural net model also seemed to perform really well and give a good prediction. But it was observed that if the size of the dataset is small the RNN model does not train well and gives bad set of predictions.

I would love to hear any suggestions or queries. Please write to me at devika.mishra@dataturks.com.

Shameless plugin: We are a data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick. Check us out.

Data Annotations Made Easy

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 331,853+ people.

Subscribe to receive our top stories here.

--

--

DataTurks: Data Annotations Made Super Easy
The Startup

Data Annotation Platform. Image Bounding, Document Annotation, NLP and Text Annotations. #HumanInTheLoop #AI, #TrainingData for #MachineLearning.