Machine Learning for Stock Trading: Natural Language Processing
Introduction
Pairs trading is perhaps the earliest form of relative value quantitative trading in equities. Using some modern Machine Learning tools in the pair trading investment process, we will show how to create sensible pairs without using any price data.
Certain stocks have highly related price series because they:
- operate in similar business lines
- have similar economic exposures
- have similar regulatory burdens
- have a coincident set of homogenous investors
- operate in the same geographic markets
Therefore, if we could read about and understand the business of each company and then link up companies based on this understanding, we should have a robust set of potential eligible pairs. This is a perfect task for Machine Learning, and, specifically, the sub-field of Natural Language Processing.
In this analysis, we will:
- gather business profiles on stocks from the AlphaWave Data Stock Analysis API using the Company Description endpoint.
- utilize the
scikit-learn
natural language processing functionalityCountVectorizer
andTfidfTransformer
to "read" these descriptions and extract important and novel concept features across all companies. - cluster stocks with
DBSCAN
to find stocks that have similar profiles.