Machine Learning for Stock Trading: Natural Language Processing

Published in

Analytics Vidhya

18 min readJul 20, 2021

Introduction

Pairs trading is perhaps the earliest form of relative value quantitative trading in equities. Using some modern Machine Learning tools in the pair trading investment process, we will show how to create sensible pairs without using any price data.

Certain stocks have highly related price series because they:

operate in similar business lines
have similar economic exposures
have similar regulatory burdens
have a coincident set of homogenous investors
operate in the same geographic markets

Therefore, if we could read about and understand the business of each company and then link up companies based on this understanding, we should have a robust set of potential eligible pairs. This is a perfect task for Machine Learning, and, specifically, the sub-field of Natural Language Processing.

In this analysis, we will:

gather business profiles on stocks from the AlphaWave Data Stock Analysis API using the Company Description endpoint.
utilize the scikit-learn natural language processing functionality CountVectorizer and TfidfTransformer to "read" these descriptions and extract important and novel concept features across all companies.
cluster stocks with DBSCAN to find stocks that have similar profiles.

Machine Learning for Stock Trading: Natural Language Processing

Introduction

Written by Hugh Donnelly