Detect Signals to Generate Trade Opportunities

Paul Lashmet
Product AI
Published in
2 min readApr 15, 2021

Challenge:

Modern trading systems consume a massive amount of data from a variety of both standard and alternative data sources. The challenge is to identify a pattern, an anomaly, a signal, or a unique causal relationship in this torrent of structured and unstructured information, in real-time, that a trader or system can then act upon and generate a trade using both traditional statistical and AI models.

Solution:

A cluster of cloud-based Linux servers with NVIDIA GPUs is trained with historical market, trading, and other relevant data that will give deep market context. This might be, for instance, news articles or earnings call transcripts that have been ingested and interpreted using NLP.

Apache Spark has the architecture to orchestrate complex pipelines on large datasets. These pipelines are used for ingestion, transformation, and statistical analysis of the data with several different approaches using SparkML in real-time. NLTK and PySpark are used to process text data including tokenization, parsing, classification, NER, and calculating sentiment. Sentiment analysis employs a number of different approaches using a custom and finance specific dictionary. Documents that are published in a foreign language are first translated into English using cloud-based translation services and then the sentiment and other text metrics are calculated, persisted and utilized for training the AI models.

A statistical approach called “Granger causality” is used to determine if a relationship between time-series data is causal. In this case, it will attempt to find relationships between disparate data points. Causal relationships are potential trading opportunities.

Apache Spark is also used to process incoming data, perform feature engineering, and make the data available for training the TensorFlow AI models. The same data that was used for traditional statistical methods is also used for training the TensorFlow AI models. Given the large amount of fast-moving time-series data it is imperative to use a large cluster of NVIDIA GPU servers to train and retrain the AI models. The AI models are then used to generate trading ideas for execution.

Taken together, this complex set of servers and algorithms enable the trader to identify subtle interconnections in market data from which to identify potentially profitable trading opportunities.

Technologies Utilized:

TensorFlow, Apache Spark, Python, NLP, Causal Analysis, NVIDIA GPUs

--

--

Paul Lashmet
Product AI

Paul Lashmet is a business integration architect and financial services subject matter expert.