Bitcoin Daily Return Distribution
A probabilistic approach to Bitcoin trading
I present a distribution that can be used to model the percent daily returns of Bitcoin. Using this model, I identify that Bitcoin has a mild net positive return and give a formula to estimate the probability of return events. This model can be used to identify and quantify the risk thereof the high risk days (i.e. high return days) and good buying opportunities (upon a lower daily return).
WARNING: This is not financial advice. Trading can get you REKT very quickly. Only do it if you know what you are doing and have fully comprehended all of the tax and legal obligations and sought the advice of a licensed financial advisor.
This Tweet from Marcel Burger started me thinking about the distribution of the daily returns. Marcel identified in that tweet that the Normal distribution didn’t fit the data very well, and a few other distributions fit better.
Here I’ve assumed that Bitcoin price cannot be predicted, since human action is unpredictable and the price is the result of human action. Thus this model is not a model of price, but rather a model of the return in percentage.
Enter the Cauchy distribution. A distribution that is typically given as a “pathological” example. It has no defined variance nor a defined expected value (mean), and it happens to fit the daily return data to a T.
The Cauchy distribution is defined by its Distribution function with two parameters as
The two parameters are scale (gamma) and location (x0). Location is also the median of the distribution.
Using the R function fitdist from the fitdistrplus package, I was able to estimate the scale and location parameters for the Cauchy distribution (and a bunch of other distributions including the Normal all of which had larger AICs than this model, indicating this model is superior). The result of this is the parameters for the model:
Thus, the model is %DailyReturns ~Cauchy(Gamma=0.016, x0=0.0017). Ergo, we can estimate the likelihood of a 45% daily drop using the model as: (1/pi)*atan(((-0.45)–0.0017)/0.016)+0.5=0.01 — A relatively rare event, but shouldn’t be dismissed as a non-possibility.
To show the fit of this, I have plotted the density of the data and the Cauchy model below.
A problem I have with this is — there is a known huge early high volatility return period. This can be seen in the time series graph of returns
We might ask — are the returns stationary? They certainly appear to be. The DFGLS and KPSS tests agree with this assessment, as the DFGLS test rejects the null of non-stationarity on all lags and the KPSS test cannot reject the null of stationarity on all lags.
Nonetheless, I want to be sure that this Cauchy distribution model hasn’t changed with time, thus I will now focus only on the last two years of data.
The model fits pretty well.
What is great about the Cauchy distribution is that whilst the mean and variance are undefined, the quantiles are defined.
Thus not only can we estimate the median (simply the location parameter to the model), but we can also estimate the other quantiles.
For example, using this model we can determine that there is less than a 5% chance of a loss greater than 0.0017+0.016*tan(pi*(0.05–0.5))=9.9%. Similarly, there is less than a 5% chance of a return of > 0.0017+0.016*tan(pi*(0.95–0.5) = 10.3%
The positive location parameter indicates indicates we have an “edge” over the 50/50 odds normally given on a daily return. We can plug this into the Kelly Criterion and use it as a way to maximise our bankroll (but I will save that for another article).
For now, this indicates that Dollar Cost Averaging (aka DCA) will always get a positive return in the long run. Using the model presented here, one may be able to identify unlikely return events and act accordingly to improve from the DCA outcome.
Thanks for reading this far!
Catch me on twitter for more interesting discussions: https://twitter.com/btconometrics
Initially, I estimated the percent returns as the log difference. However this relationship is only true for small changes and thus there were slight differences between the larger values in the real distribution and the log difference distribution. Consequently, I have re-estimated everything using the real percent difference (price-lag(price))/lag(price). It led to a very small change in the location parameter (shift from 0.0018 to 0.0017) and no change to the scale parameter.