Improving our autopilot crypto predictions system

I hope you are all doing very well today!

Until now I had to spend a lot of time using photoshop overlaying actual data with the predictions. So to be a lot more productive I have extended the predictions tool with “extend actual data”. This allows us to keep as many known prices on the plot (max limit is the number of predictions, 20 in our case):

You can simply enable that checkbox and hit the “prev” button to look at the predicted accuracy. The screenshot above shows how it works: we have a datetime set to “… 19:00” so at that interval our server made 20-interval predictions starting at “… 19:30” (indicated by the red circle).
 
You will notice that some predictions are a far cry from the actual data (a.k.a. reality), while others are more or less “okay”. When you enable this checkbox it will not use caching, so it may take up to a second (depending on your connection speed) to return the data (instead of 100–200ms).

Yesterday and the day before yesterday I’ve been working on an improved version of the prediction chart. The previous version generates predictions based on one set of pre-defined parameters. But that’s not a good system at all. We should instead generate (A) many predictions using different parameters and settings, (B) compare them and (C ) use common sense to make trading decisions.
 
(A): The previous version of the prediction tool lacked “multiple predictions”. So I spent many hours tweaking and expanding it.
(B): Right now comparing the “multiple predictions” is a manual process, but in the near-future I want to use AI/data mining to detect patterns in the multiple predictions.
(C ): Idem ditto as (B).
 
The redesigned prediction tool’s page looks like the screenshot below.

Here is a detailed description of every new option/setting:

  • Features type:
     The old tool’s version used one combination consisting of 5 features to make predictions (price, volume, social hype, social & news sentiments).
     In the new version you can choose from 6 different feature combinations:
     hopkins: price and volume24h
     leonard: price, social and news hype
     maggy: price, social and news sentiments
     davinci: price, volume24h, social and news hype, social and news sentiments
     jack: price, volume24h, social and news sentiments
     zeus: price, volume24h, social and news hype
     
     * you will notice that some feature types yield more accurate results than other depending on the interval size. Sometimes just the price and volume appear to be quite accurate, but in other cases it needs hype and/or sentiment data.
  • Batch size:
     Choose from [100, 200, 500] — the batch size indicates how many data points were used in a single training session. It’s being said that a higher batch size results in less accurate predictions, so it is recommended to use 100 or 200. A few minutes ago I’ve added 500 in there to analyze its predictions.
  • Neurons:
     The number of neurons used in the LSTM network. At this stage I will not make any claims on which value you should use.
  • Sequence size:
     Making a prediction is all about taking a sequence of historical values [x, x+1, x+2, …, x+n-1] and teaching the neural network that the outcome is [x+n] (where n is the sequence size). In some cases we use small sequences to relate it to the next outcome and in other cases we use larger sequences. I believe a sequence size of 20 is the best value right now (but I may be wrong, need to investigate it further).
  • Epochs:
     This is the last learning parameter — it is the number of forward+backward passes throughout the learning process. I am testing 500 vs 1000 passes at this stage.

You also see that the plot has three graphs:

  1. The dark-yellowish line is the actual (average) price of the selected crypto at every interval.
  2. The green line is something I have explained a few weeks ago, but let me rephrase it. We use the actual/historical data to train neural network, and then use the same training data and ask the network to make predictions. And if the network is more or less well-trained, it should return a graph that resembles the actual/training data. You may toggle this graph off, but for now I have included it to make sure there is (A) no over-fitting and (B) no under-fitting.
  3. The red line is the predicted future (20 intervals).

If you are going to use this 2nd version of the prediction tool, keep it mind that it has not made many predictions at this moment. However there are 3 intervals it predicts for: 1hr, 12hrs and 24hrs. So at every interval (e.g. every hour) it makes a prediction for the 1hr interval.

  • 1hr: the prediction starts at the start of every hour, and usually takes 15–30 minutes to complete.
  • 12hrs: this starts at 00:00 and 12:00 (GMT) and takes 15–30 minutes.
  • 24hrs: this starts at 23:00 (GMT) and takes 15–30 minutes to complete.

In v2.0 only Bitcoin (BTC) is enabled. The reason is that the system makes 216 predictions per interval. So for the hourly interval it comes down to 216 x 24 = 5184 predictions per day. Then the 12hrs + 24hrs = 648 predictions per day. In total this system generates 5832 predictions in a single day. Which is quite a lot for just one coin. And take into account that every predictions takes several seconds to complete, so it’s quite expensive. The plan is to find a small set of useful parameters which we can generalize for other cryptocurrencies and then drastically reduce the CPU time.

One more screenshot (12h intervals):

Most of today I have spent tweaking the predictions, analyzing results and more… I am making some good progress and figured out some interesting parameters that yield good predictions.
 
But also developed some extras. I have added a candlestick chart which shows the open/close and low/high per interval. I also want to add additional graphs (e.g. the Price trend, and traded volume), but there is a bug in Plotly JS which I’ve reported: it doesn’t hide graphs by clicking on the label — it should be taken care of in the coming few days.

Screenshot of the Candlestick chart:

Lastly but definitely an important update is that the intervals on the x-axis no longer indicate the result of the previous time.
 Here’s what I mean by it: Previously you would see something like “19:00” and that would mean all results from 18:00 until 18:59 included.
 But now “19:00” means from 19:00 until 19:59 included. This is the correct way and much more intuitive, not sure why I didn’t do it from the start.

Cheers all!
- Ilya Nevolin