Quantitative Trading Series Part 2: Further Insight to Trading Digital Assets Utilising Machine Learning

Moses Dada CQF
Digital Alpha Research
8 min readJan 1, 2020

This marks the second part of The Quantitative Trading Series, where we will continue to highlight and expand our insights in Part 1 and the additional quantitative methods that can be applied in the world of digital assets.

In this article, we will continue to demonstrate just how well various machine learning algorithms perform when trading digital assets and compare them in terms of prediction accuracy and ultimately, returns.

Specifically, we will compute and investigate 1) Cross Validations/ K-Fold Cross Validations 2) Confusion Matrices 3) Support Vector Machines 4) Random Forests and Ensemble Methods.

We will go on to state simple definitions of these techniques without stating the mathematical properties and solving from first principles.

Definitions

Definition 1: Cross Validation

Cross Validation, sometimes called out-of-sample testing, is a model validation technique implemented to assess how the results of statistical analysis will generalise to an independent data set.

Definition 2: Confusion Matrix

Confusion Matrix, also known as an error matrix, is a specific table layout that allows visualisation of the performance of an algorithm.

Definition 3: Support Vector Machines

Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. More formally, a support-vector machine constructs a hyperplane or set of hyperplanes in a high or infinite-dimensional space, which can be used for classification, regression, or other tasks like outliers detection.

Definition 4: Random Forests and Ensemble Methods

Random Forests or Random Decision Forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set.

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.

Implementation

We will continue to use the same technical indicators or features used in insight in Part 1 for predictions:

1) Momentum and 2) Moving Averages (MA) with 9 Periods (MA9), 200 Periods (MA200) and Exponential Moving Average with 20 Periods (EMA20).

The data consists of 356 entries used as predictors, the prices at daily close, ranging from the dates 2016–01–01 to 2017–01–01.

Cross Validation

In K-Fold Cross Validation, the original sample is randomly partitioned into k equal sized sub-samples.

Using sklearn ‘sklearn.model_selection’ for an automated K-Fold Cross Validation, we compute scores from reshuffled samples. As stated above, it’s important to note, the Cross Validation scheme represents an extension of back testing.

To separate training and testing data sets we use:

‘sklearn.model_selection.train_test_split’

To ensure the parameters split the data set 50/50 with reshuffling we use:

test_size=0.5, shuffle=True

Now we observe the coefficients and the Classification accuracy for a train/test data set split.

We can observe that the dates below show that the time series has been reshuffled.

Shuffled 50/50 Output Sample from Data Entries

Using the coefficients that are estimated with half of the data against the true population coefficients, we get a fairly good accuracy score above 70%.

Output from Data Entries: Accuracy Scores

Taking a deeper look into the classifier and prediction accuracy we can compute a classification report per below:

precision    recall  f1-score   support

-1.0 0.68 0.57 0.62 69
1.0 0.75 0.84 0.79 110

avg / total 0.73 0.73 0.73 179

NB: The accuracy changed from 76.5 to 73.0 because as the kernel is restarted we obtain varied results. We can also see that we obtain better prediction for positive returns.

Confusion Matrix

As stated above, the Confusion Matrix allows us to look into accuracy of prediction within each class. From the ‘sklearn.metrics’ library we can import confusion_matrix function, producing the following results:

[[175 118]
[ 96 235]]

From the results we observe that the predicted positive returns, were actually positive 175 times and similarly the predicted negative returns, were actually negative 235 times. This is a fairly good result with only 118 negative predictions and 96 positive predictions, wrong.

A good confusion matrix will have green squares diagonally, this is highlighted in the visualisation below.

Confusion Matrix — Logistic Regression

Support Vector Machines

SVM classifiers provides the end result similar to the Logistic Regression we mentioned in the previous series. Whilst SVMs do not have regression coefficients, it can output ‘SVM_SVC.supportvectors’ attributes.

We can soften the margin by modifying the parameter. For the large, the margin is hard, and points cannot lie in it. For the smaller, the margin is softer, and can grow to encompass some points.

The allowance of softness in margins (i.e. a low cost setting) allows for errors to be made while fitting the model (support vectors) to the training/discovery data set. Conversely, hard margins will result in fitting of a model that allows zero errors. Sometimes it can be helpful to allow for errors in the training set, because it may produce a more generalizable model when applied to new datasets.

Let’s leave two features; we compute SVM with restrictions by only using two features: Momentum and EMA20. For a soft margin we compute C =le1 (which is C=10) and C=le6 (1000000) for the hard margin. We get very similar results.

SVM with Restrictions Momentum and EMA20

For the decision boundary (the red and yellow plots) and support vectors (the blue plot), without data we do not have contour levels (specified as a scalar whole number or a vector).

Transition Probabilities from Support Vector Classifier

SVM_SVC.predict_proba(X_Features)

From computing the above, we get the transition probabilities below:

array([[0.25753004, 0.74246996],
[0.61488705, 0.38511295],
[0.61487879, 0.38512121],
...,
[0.61491708, 0.38508292],
[0.25754539, 0.74245461],
[0.25758191, 0.74241809]])

Random Forests and Ensemble Methods (AdaBoost)

To further elaborate on the above definition, Random Forests operates by constructing a multitude of decision trees whereas Random Decision Forests correct for decision trees’ habit of overfitting to their training set.

Our model is fitted from the training data, then the second model created that attempts to correct the errors from the first model. We can visualise both the Decision Tree Regressor and Random Forest using Graphviz.

Decision Tree Regressor
Random Forest

Decision Tree classifier operates with very elaborate decision tree and predicts data points on which it was trained.

The diagram below explains the Decision Tree Regression (green plot) and Boosted Decision Tree (red plot). We can observe how the Decision Tree Regression overlapped the training samples (black plots).

The real test and open question that still remains is weather or not, the fitted decision tree will be as effective on ‘Out of sample’ data.

Decision Tree Regression

It is also worth noting that the two models are mixed on scratterplot such as these. In fact, there are three datasets: ‘one actual’ and ‘two predicted’ from our models:

regr_1 = DecisionTreeRegressor(max_depth=4)
regr_2 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4), n_estimators=300, random_state=rng)

The nature of AdaBoost classifier is to create copies of the trees and we see the search example produced by a random forest.

AdaBoost Classifier

We can see the AdaBoostRegressor clearly created random estimators. The Decision Tree classifier has such an elaborate fitted Decision Tree (above) that it predicts every in-sample point closely.

Now we plot the training samples and two-class decision scores which can be shown here:

We observe the Decision Tree Regression classifier limits itself to the set class labels. The predicted class label for each sample is determined by the sign of decision score. The magnitude of a decision score determines the degree of likeness with the predicted class label.

RadViz from Pandas is a way of visualising multivariate data, based on spring tension minimisation. It works well with class labels {-1,1} of the dependent variable.

Below we used Radviz to plot the graph. The graph shows Momentum is very significant compared to Moving Averages and Exponential Moving Average.

Conclusion

In comparing these machine learning methodologies, we have seen how various machine learning techniques can be applied to technical indicator features. Judging from our analysis, we learn:

  1. SVMs provide a powerful method for supervised learning. In this task, the hyperplane might not be of high quality (separating positive from negative returns) or not well-defined, however, transition probabilities output is credible.
  2. However, in predicting return signs, the non-linear kernel increased computational time and a visualisation challenge but not produced much different output. It took over 3 hours to compute on a 8GB RAM,Quad core and Intel i5 system.
  3. The most significant feature is Momentum.
  4. Confusion Matrices helped to look into prediction accuracy within each class. As soon as we separate dataset into train/test halves, the logistic regression missclassified 96 out of 331 negative returns and 118 out of 293 positive returns.
  5. The visualisaiton of Decision Tree Regressor revealed how it starts building the tree by separating within the feature Momentum.
  6. It is important to use the Decision Tree Regressor because the Decision Tree classifier creates a very elaborate tree to predict data points on which it was trained close to perfect.

In the next part of the series. we will analyse a pairs trading cryptocurrency strategy using mathematical models.

Given how nascent the asset class is, very few dedicated digital asset research providers currently exist. If you are interested in bespoke quantitative research, please visit https://digitalalpharesearch.com to get in touch with our team.

--

--

Moses Dada CQF
Digital Alpha Research

Quantitative analysis, Quantitative Trading and Machine Learning