Stock Portfolio Construction: a proof of concept using Apache Spark
Recently I came across a conference paper by Joglekar (2014) who uses a two-stage approach to constructing low risk and stable return stock portfolios. The idea is simple:
Step 1: Perform correlation-based clustering on a set of financial instruments.
Step 2: Use a genetic algorithm to build an optimal portfolio.
Why not implement this on a massive scale using Apache Spark? In this post I will explain how (and why) to do so based on ~5 years of daily closing price histories of 2,000 stocks (NASDAQ constituents; the dataset from Chapter 9 of Advanced Analytics with Spark).
What has clustering got to do with this?
A widely used risk management technique is portfolio diversification. This basically means that you want the stocks in your portfolio to be “different”. From a statistical point of view, one of the ways to measure this difference is correlation. Take a moment and think about the following (simplified) scenarios:
- Most stocks in a portfolio are (highly) positively correlated.
In such situations stock prices are expected to move in the same direction — so if your forecast is correct…….
Originally published at www.datareply.co.uk.