Machine Learning for Stock Trading: Natural Language Processing

Hugh Donnelly
Analytics Vidhya
Published in
18 min readJul 20, 2021

--

Introduction

Pairs trading is perhaps the earliest form of relative value quantitative trading in equities. Using some modern Machine Learning tools in the pair trading investment process, we will show how to create sensible pairs without using any price data.

Certain stocks have highly related price series because they:

  • operate in similar business lines
  • have similar economic exposures
  • have similar regulatory burdens
  • have a coincident set of homogenous investors
  • operate in the same geographic markets

Therefore, if we could read about and understand the business of each company and then link up companies based on this understanding, we should have a robust set of potential eligible pairs. This is a perfect task for Machine Learning, and, specifically, the sub-field of Natural Language Processing.

In this analysis, we will:

  1. gather business profiles on stocks from the AlphaWave Data Stock Analysis API using the Company Description endpoint.
  2. utilize the scikit-learn natural language processing functionality CountVectorizer and TfidfTransformer to "read" these descriptions and extract important and novel concept features across all companies.
  3. cluster stocks with DBSCAN to find stocks that have similar profiles.

--

--

Hugh Donnelly
Analytics Vidhya

Hugh founded AlphaWave Data in 2020 and is responsible for risk, attribution, portfolio construction, and investment solutions.