Part 2: Building an Analysis Platform

Corey Auger
5 min readMay 4, 2018

Previously we discussed the premise behind searching for day trade patterns in the stock market, here.

In this next section I will be going over some of the technical consideration regarding building an analysis platform. We will discuss getting data, cleaning data, and how to store and facilitate fast access to your data. In addition we will discuss the tools I used to build a scalable and fast data platform at daytrader.ai

Sign up for the Beta at http://daytrader.ai

Data Requirements

For our system we are going to need stock market data at a resolution of 1 minute.

For each minute we want to know:

  • The price at that start of the minute “open price”
  • The highest price a buyer paid for that minute which is the “high price”
  • The lowest price a buyer paid for that minute which is the “low price”
  • The price at the end of the minute “close price”
  • The volume of shares traded during that minute.

** note: at minimum we could work with just a close price

In most cases the “close” price of the last minute should be the “open” price of the next minute. The system will need as much historical data as we can find for past prices, as well as a source to query every minute for future data. We will want a system that can react to new data the instance it shows up in the system, as well as a way to simulate or backtest ideas with the historical data that we store. From this we can see that or data can roughly be divided into 2 categories:

  • Past or historical data
  • “Real time” data that we grab every minute

Getting Data

Data feeds

Grabbing “real time” data can be achieved in a number of ways. There are a few good free and open data sources:

IEX: https://iextrading.com/

  • Pros: free
  • Cons: data is limited to only a trade view

Alphavantage: https://www.alphavantage.co/

  • Pros: free, lots of stock, crypto, forex
  • Cons: They have rate limiting which will cause missed data if you hit their limits (which is easy to do)
  • NOTE: “real time” data for alpha vantage is just pulled from IEX

Questrade: http://www.questrade.com/

  • Pros: websocket api for realtime ticks
  • Cons: paid as well as rate limits

I have also created a number of fast streaming and Non-blocking api wrappers in Scala. Feely available on github (MIT license).

There are many many more paid data source that very in quality. If you are serious about having a reliable and fast source for potentially hundreds of stocks you will need to find a paid source. My goal is to have daytrader.ai be a future source where data is available at a minimal cost.

Historical Data

Getting historical financial data is a pain. Why is it that we don’t have a 1 minute resolution history of the market freely available to anyone on the net?

Rich and accurate historical data is elusive on the web, and I’ll save you time and resources by directing you immediately to

“Caltecch Quantitative Finance Group”

http://quant.caltech.edu/historical-stock-data.html

I purchased the data from both Kibot and QuantQote. In the end I did a comparison and threw out the Kibot data. So the historical data source for daytrader.ai is for the most part is “QuantQote”.

The program I wrote to compare the 2 data sources is also located on my github here: https://github.com/coreyauger/spatula-city

I will caution you, in both cases (Kibo and QuantQuote), you purchase data and then wait upwards of a week to hear back from the vendor who finally provides you with a link to download your data.

Cleaning Data

As with most data you will want to think about what to do with missing or erroneous data points. The strategy that I used to fill small gaps in the data is the “forward fill”. I have heard of some people doing a “back fill”, but this is ill advised. You would be filling in data from the future which would not been available to use in a “real time” usage of the system. This will really screw with any machine learning models that you create, so don’t do this! Instead we forward fill small gaps with the past data.

Loading Data

At daytrader.ai I am storing my data in Cassandra. Cassandra is built to scale and provide high availability and ultra fast reads to many many clients. If you are building a personal project it is NOT the correct database to use. Instead use your favourite relational database (eg: Postgres). Make sure you design your tables in such a way that you can access data for N symbols between some time range T.

System Design

Daytrader is designed in such a way to allow for multiple “clients” to attach and stream data for whatever their needs may be. These types of clients include but are not limited to:

  • Websocket web clients
  • TCP clients
  • Data visualization workflows
  • Flink jobs that perform pattern discovery
  • Machine Learning Jobs

The platform architecture.

Gateway:

  • Highly scalable
  • Handle Authentication
  • Websocket connections
  • TCP sockets
  • HTTP requests

Kafka:

  • Message bus
  • High throughput
  • Highly configurable

Routing Modules:

Akka User Actor

  • Manage all connected clients for a single authed user

Akka Ticker Actor

  • Manage caching on a per symbol basis
  • Responsible for pushing real time data through the system as it arrives

Akka Trader Actor

  • Handle pattern triggers for buy and sell
  • Types of traders include: Mock trader, Training data creator, Questrade trader, Can be extended…

Store:

  • Api wrapper to data source.
  • Highly scalable (create any number of instances in the cluster)

Flink jobs:

  • Custom flink Source connector (pulls from Tick actors)
  • Custom flink Sink that pushes data to Trade actors

Summary

In this post we talked about some of the challenges associated with getting and cleaning data, both real time and historical. Next we reviewed some of the architecture that I have built for daytrader.ai. In the next post we will start to take a look at how we can use Apache Flink to mine our data for day trade patterns and how we can transform this data into training example for machine learning. I will also provide links to real data !

Sign up for the Beta at http://daytrader.ai

Previous post: Machine Learning Applied to Intraday Trading

Next post: Generating Training Data

--

--