An exploratory data analysis on bitcoin price swings
There is an abundance of articles focusing on usage of complex neural network algorithms in order to predict the price of bitcoin, however there is not much attempt on doing exploratory analysis with the data itself to understand the underlaying market behaviour.
Thanks to services like Coinograph, distribution of cryptocurrency trading data is becoming mainstream which in turn allows more enthusiasts to start experimenting with the data.
In this post I will go through the steps on how to load Coinograph’s historical data and using pandas and scikit-learn to do simple analysis on bitcoin pricing data.
Data:
Pair: BTC/USD
Type: 1 hour OHLC candles
Exchange: Bitfinex
Duration: 5 months
Format: csv
Sample:
Analysis:
First let’s load this data and do a visual inspection on open, close, low, high and volume attributes. We use scatter_box
from pandas:
A scatter box shows a pairwise scatter plot for each of the chosen columns above. You can read more about scatter matrix here: Pandas.Visualisation.Scatter_Matrix
The most interesting parts of the above matrix are the columns involving volume. Let’s zoom in a bit:
The most dense areas in the above plots are average bitcoin trading volumes. What we are interested in is the less dense areas in these plots: the areas with very high volumes (which usually corresponds to large price swings within one candle). If we visually draw a line to separate the dense areas with the less dense areas, we get a line roughly around 5000 on volume as below:
Price swings within one candle are measured by low and high values in that candle. Therefore we will be focusing on these two values in our analysis.
Assumption: By Visually inspecting the data, we assume that the datapoints above the red line (with volumes higher than 5000) share certain characteristics and can be grouped together.
To validated this assumption, lets feed this data to a clustering algorithm called Gaussian Mixtures. (I have tried a few clustering algorithms and Gaussian Mixtures with 5 mixture components turn to be the best fit for our experiment). We use Scikit-learn as follows:
And the visualisation:
It looks like the datapoints that have a volume close to 5000 form a cluster.
Let’s fetch these datapoints and have a closer look at them:
As can be seen from the plots, these datapoints are the ones which have the highest price swings in one candle (difference between high and low). Let’s calculate the price swings and draw a histogram to see how the distribution looks like. Since the price of bitcoin will change over time (hopefully higher!), it is better that we calculate the percentages:
Let’s find the median of these values :
Which results to 4.35%. This gives us an interesting observation. It seems that whenever the trading volume has been above 5000, there was on average a price swing of 4.35% (either increase or decrease).
If this proves to be a general enough formula, a technical indicator (e.g. in Tradingview) for trading can be build (of course it won’t be easy as it sounds). To properly evaluate this idea, I have written a few lines of code to monitor the changes on Bitcoin market movements. If successful we will be adding this indicator to our famous trading bot: https://t.me/coinographbot .
As you can see, through exploratory data analysis with a few lines of code, we can discover potentially useful information that can be beneficial for our trading endeavours.
In the next posts I will be writing about more of these experiments for the public. Follow me here on Medium or on twitter: https://twitter.com/hossein761 to stay up to date. If you are looking for historical data you can always contact us at: https://coinograph.io/.
DISCLAIMER: The sole purpose of this article is for education. What is mentioned in the above article is no investing advice and no trading advice. If you trade based on the findings of the above article you are the one responsible for any loss or damage.