Algorithmic Trading 101 — Lesson 4: Portfolio Management and Machine Learning in Python
We’ve been tinkering away at our platform in preparation for launch (get ready for some exciting news on that front!) and hosted a Cryptocurrencies 101 webinar with Quantopian, but now we’re back. 💪
Before we start… You’ll see that posted some simple arbitrage code to our GitHub between The Ocean and Binance as an ‘answer’ to our last lesson. As with all code snippets, we don’t recommend live-trading before understanding and tailoring the strategy to your specific use case and risk tolerance.
So, why Python? It’s the language used by many algorithmic traders today for its (relative) ease-of-use and nice applications like iPython Notebook for sharing analyses. Most importantly, it also has a rich set of libraries for all types of data science and machine learning applications — like
scrapy. Without building the functions yourself, they make it easy to visualize your model and run a variety of tests out-of-the-box. Furthermore, it’s the best way to parameterize and compare machine learning models.
Portfolio Management 101
However, before we delve into machine learning, it’s important to talk about portfolio management strategy —an essential component of long-term trading success.
Portfolio management doesn’t just mean what types of assets, but also:
- What types of strategies
- How much of my net worth allocated to each
- How often do I adjust
We’ve covered several different strategies so far throughout the course, and (if you watched our Cryptocurrencies 101 webinar) you know how diverse and ever-growing the crypto asset marketplace is. Picking your risk-return profile, strategies, and what assets you want to employ are key. Here are some points that you should note when building your portfolio:
Active vs Passive (‘Buy and Hold’) Strategies
Active portfolio management refers to when individuals consistently watch market trends and attempt to generate alpha (What’s the difference between alpha and beta? — Murphy) by updating their positions frequently. In other words, you’re trying to beat the market. Passive portfolio management (typically but not always ‘buy and hold’ strategies) refers to making bets on particular markets or indices. Both can be advantageous when executed correctly. This often depends on the robustness of your model and your individual preferences.
Generally, you shouldn’t all your eggs into one basket. Choosing assets that range in variety across different products can minimize the downside risk. If you’re correctly diversified, when one asset value goes down, others in your portfolio will go up. There are all types of ‘diversification’ — across geographies, economic sectors, types of stocks and bonds, etc. Such measures can get quite technical, but you can read more in-depth about measuring your portfolio here (Measuring Portfolio Diversification — Kirchner, Zunckel).
At some point, you’ll need to decide how much money you allocate to each of your strategies across each of your asset types. Finding the right balance between quality and quantity of assets — so doing your homework is key! For example, two different market sectors with two different stocks may seem unrelated, but they might be highly correlated — in which case you get little ‘diversification benefit.’ This is especially true in crypto-markets, which today are still highly correlated with valuations still up for debate. You can read some good general capital allocation principles here (A Modern Approach to Asset Allocation and Portfolio Construction — Davidow, Peterson), or you can refer to an earlier piece we wrote on basic crypto valuations to start your research.
Suppose you have the best, most well diversified portfolio imaginable. Markets move, prices change, and suddenly, your portfolio is no longer ‘optimal’ according to your strategy — it’s time to rebalance! Rebalancing means adjusting your portfolio weights, and goes beyond specific asset allocation targets. Strategies may need to shift, since their profitability can change over time. Furthermore, your own risk-return mindset might also differ, i.e. investing in more risky assets at age 30 vs. age 70. Ultimately, you’ll need to find the right balance (pun intended) or sweet spot when you adjust your portfolio allocation. You’ll find some good food for thought on these trade-offs in this Vanguard piece (Best practices for portfolio balancing — Jaconetti, Kinniry, Zilbering).
An important consideration is liquidity and availability of shares within a certain product. If certain products have low liquidity, they will be harder to sell and buy at the volume and price you desire. In fact, you might actually move the market with your trade. And if others are buying or selling in the same direction, you might find yourself with no easy way to get out of your position without heavy losses. This incredibly important considering the volatility and sometimes illiquid crypto-market. For those curious, you can apply some liquidity metrics here to your portfolio (Do liquidity measures measure liquidity? — Goyenko, Holden, Trzcink).
Intro to Machine Learning
Perhaps the hottest engineering paradigm (besides crypto and blockchain, of course!) out there today is incorporating machine learning and artificial intelligence into everyday life. This is especially valid in financial markets, where machines can execute trades at lightning speeds and there’s a tremendous amount of different types of data to learn from. ML techniques can improve even ‘simple’ trading models, i.e. in mean reversion models, position bands and stop loss limits could be chosen by a machine via backtesting, as opposed to being arbitrarily chosen by a person. This is the essence of machine learning in finance — machines are trained to find the best patterns or trends within the given state and then find the optimal solution for the next state. Eventually, a model/profitable forward strategy would be created solely by the machine without any influence of human emotion or biases.
Today, machine learning algorithms are in a primitive state, so there is a great deal of opportunity. Machine learning algorithms fall in three general categories:
Supervised learning is the most common and largest portion of the machine learning universe. An algorithm is built such that all independent regressor variables x can be mapped to a dependent variable Y. For example, Y = f(x1, x2, … xn). The goal is to optimize this mapping function so that we can obtain the best predictions for the dependent variable Y. In essence, the algorithm repeats making predictions on the training set and compares the values to the true known output values. The term ‘supervised’ refers to the fact that we are teaching the learning algorithm by checking against the correct answer and updating the model as needed. Eventually, the model stops learning when we have an appropriate confidence level of success. Variables within this algorithm can either be a classification problem, where the output variables are a categorical variable, or regression problem, where the output variable is a quantitative number.
An example of a supervised machine learning algorithm is a Random Forest decision tree. At each node of the tree, you can have a classification or regression question about the data. The output could be binary or have multiple answers which build out the tree for the next node. Based on the size of the model, each node builds out new nodes which create the entire random forest. These models can have low overhead (from a computational perspective) and good accuracy, but as the number of possible answers at each node increases, their usefulness can decrease. You can explore the Random Forest model here (The Random Forest Algorithm — Donges).
Unsupervised learning relies on no output dependent variable data Y. Instead, the algorithm tries to identify the best patterns and underlying structure with the given independent variables x. As there are no true correct answers to compare the model to, the machine learning algorithm creates its own methods within the training set. Clustering models are a common type of machine learning algorithm, where data points are grouped into similar buckets based on rules defined in the algorithm. In K-means clustering for example, the value K determines the number of groups that will be created by the algorithm.
K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity.
You can read more about K-means clustering here (Introduction to K-means Clustering — Trevino).
Semi-supervised learning combines the two cases above. It’s applied when there is a large set of dependent variable x, but only a small portion has associated independent output variables Y. An example of this type of model is often used in image procession, as images can have multiple objects within the picture but only a few things in the picture are labeled. Models can be combined — such that we use 1) an unsupervised learning algorithm for the background of the picture, and 2) a supervised model used for the known things in the picture. Choosing between the two types of models depends on the availability of the data and how much confidence your predictive model is able to provide. You can read about an application of semi-supervised learning here.
Next time, we’ll look at some specific ML models and their applicability to crypto-markets. In preparation for that, we suggest you start to take a look at:
- Andrew Ng’s excellent Machine Learning course on Coursera
- A ‘basic’ Introduction to Machine Learning (Smola & Vishwanathan)
- And a few Python tips for ML (Brownlee)
Challenge #4— Portfolio Management
It’s a simple one this week! Create your own crypto asset portfolio and explain your rationale.
Bonus: Measure the correlation of the assets in your portfolio and examine their liquidity.
*Remember, anyone that participates on Telegram or sends us a solution anytime during the course of our Algorithmic Trading 101 series is eligible to receive part of $5000 in cryptocurrency prizes.
🤖 Links to Lessons 🤖
The Syllabus & How to Win
Lesson 1: Time Series Analysis
Lesson 2: Data, Strategy Design, and Mean Reversion
Lesson 3: Intro to Arbitrage Strategies
Lesson 4: Portfolio Management and Machine Learning in Python
Lesson 5: More Machine Learning
Remember to join our Telegram if you have any questions!