Predicting Bitcoin price using historial price, volume and social hype

21 Jan., 2018 — A couple of days ago I’ve added a whole range of new words and emoji’s to the sentiment analysis algorithm. Now I have plotted the price, volume and sentiments at 10min intervals for the past 24 hours:

It’s quite stunning to me how much the sentiments (green) and volume (purple) are related. I could go over the entire plot and discuss every hour, but I’ll limit myself to the blue rectangle.
At some point the price dropped drastically (from 11.8k to 11.4k) and simultaneously the sentiments dropped (more negativity) on social media. But then at its lowest the trade volume peaked very high and sentiment started going up (people started buying — buy at a lower price). So right now the sentiment is steadily increasing, so I wonder whether that will make the price go up or down in the next few hours.
PS: Once I have more volume-data I will add it as a new parameter to the machine learning algorithm (& make some predictions).

In the evening of 22 Jan., 2018 — Just made a few predictions using three parameters: price, absolute hype figures and volume24h:

These were made on 22 Jan 2018, 21:30 GMT+1. I will compare these tomorrow against the real values.

23 Jan. 2018 — Let’s have a look at our predictions for BTC-USD price.
Below is a graph of the real data. Everything before the red line was used to make the predictions yesterday. And everything after the red line is the newly accumulated data: ( the red line is drawn at exactly 22 Jan. 2018 21:30 GMT+1).

Below is a list of images where I’ve put the prediction on top of the real data:
 IMHO this one is the most accurate prediction:

It did not predict the valley at 23:00 (2nd prediction-point), but for the next 7 hours/points it was a pretty accurate prediction (until it the price dropped at 07:00). A few notes for the geeks:

  • Each point represents an hourly interval.
  • The predictions were made using 3 features: price, volume and hype.
  • All predictions used 2 neurons in the LSTM.
  • The only variations were: the LSTM batch sizes and input sequence lengths.
  • The training set was 144 hours (= 6 days). However for the volume I only had 2-days worth of data. In cases where data was missing I’ve let it ignore zero-values by using a masking input layer.

For additional content and feedback from other forum members visit my official forum-thread here:

Have a good day all ! :)
- Ilya Nevolin

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 289,682+ people.

Subscribe to receive our top stories here.