PART EIGHT: SQL Schemas — Building a Production-Ready Algorithmic Trading Framework in Python
Hello there,
We continue the series on Building a Production-Ready Algorithmic Trading Framework in Python. This article is going to focus on our database schema. The prerequisites for this article can be found below. They contain the classes you will need to get up and running and, if you missed them, the inside knowledge to understand what is happening.
· Logging
· SQLite
If you have any experience building dashboards or front-end stuff, a well-designed backend will make developing the front end a piece of cake.
First, sitting down with a pen and paper and planning this out is always a great start. If you are new-school and prefer to see some visuals immediately, here is a great tool that allows you to input your SQL queries to create your schema. It will create the schema maps you see in this article.
The Design of Our Schema
When designing a database, you need to think about what columns can be reused repeatedly (Facts) and what columns can be added to categorise the data (Dimensions) so it doesn’t get lost or confusing in the sea of table rows.
For example, suppose you have SPY500, NASDAQ, XAU_USD, and EUR_GBP data, each with a data frame containing Date, Open, High, Low, Close and Volume columns (Facts), respectively. How can you efficiently store all that information? You would combine the data frames into one and add a column (Dimension) called InstrumentName or InstrumentKey, the latter being preferred as
- It is an integer, and they are more efficient to store than strings.
- This can link to a separate dimension table using an SQL feature known as Foreign and Primary Keys — Google that if this is the first time hearing it, it’s essential to understand for the rest of the article.
TL;DR: The mega query to create this schema is at the bottom of the page.
Date, Time and Frequency
Let’s create the SQL schema to see the tables we will need. The finance world is all about time series. And time series comes in several frequencies and granularities. My first table is designed to capture that and categorise my facts so I know which data to pull in later stages, i.e. (Yearly, Monthly, Daily, Hourly).
The following tables are dimension tables with date and time features. Data that is time series naturally has a lot of features just in the temporal space. This type of table is essential to categorise things by month, quarter or whatever your heart desires.
Instrument Data
The next natural choice is the target instrument tables. As you can see, I have added some more columns to enrich my data. The first two columns are apparent, an index value InstrumentKey and the human-friendly name InstrumentName.
The next column is the type of instrument which links to another Dimension table; examples of instrument types can be Forex, Crypto, Stocks, Funds, or Futures.
Finally, we have the LineTypeKey, which links to another Dimension table. Sometimes the rows of OHLC data are not the same or, in other words, single-line data. This column will help me ID the instruments unsuitable for a candle chart. I can enrich other datasets using this too. For example, is it range bound between 0–100? or is it an overlay instead of being in a different figure? these are some small examples, but I hope you get the point about planning your backend to make plotting a piece of cake in the future.
As you can see, I will need two more Dimension tables with a foreign key relationship to my Dimension_Instruments.
Finally, we will need a facts table to store the actual data, with our InstrumentKey, GranularityKey, DateKey, and TimeKey having a foreign-key relationship to their relevant dimension tables.
Indicator Data
The next stage is to generate the indicator schema. Typically, you need instruments to make indicators. Some examples are RSI’s, MACD’s, Moving Averages. You will start to notice a design pattern with these types of tables. This table will hold the information surrounding our indicators, with foreign keys to IndicatorTypeKey, IndicatorCategoryKey and LineTypeKey.
Finally, like our instruments, we will have a Facts table for our Indicators; this will be enriched with InstrumentKey, IndicatorKey, Granularity DateKey and TimeKey using foreign keys to our Dimension tables.
Overview
You should have something that looks like the image below. As you can see, a set of core tables is being enriched from other dimension tables.
If you want to replicate this or add to it (which I encourage you to do!!), here are the SQL scripts to get you going.
If you use the tool I recommended above, click on Import/Import from SQL Server and copy and paste the below in; it should work a treat.
CREATE TABLE Dimension_Granularity (
GranularityKey INTEGER PRIMARY KEY,
Name TEXT NOT NULL,
Alias TEXT NOT NULL,
OandaAlias TEXT NOT NULL,
TimeDelta TEXT NOT NULL)
CREATE TABLE Dimension_InstrumentType (
InstrumentTypeKey INTEGER PRIMARY KEY,
InstrumentTypeName TEXT NOT NULL)
CREATE TABLE Dimension_IndicatorCategory (
IndicatorCategoryKey INTEGER PRIMARY KEY,
IndicatorCategoryName TEXT NOT NULL)
CREATE TABLE Dimension_IndicatorType (
IndicatorTypeKey INTEGER PRIMARY KEY,
IndicatorTypeName TEXT NOT NULL)
CREATE TABLE Dimension_LineType (
LineTypeKey INTEGER PRIMARY KEY,
LineName TEXT NOT NULL)
CREATE TABLE Dimension_Instruments (
InstrumentKey INTEGER PRIMARY KEY,
InstrumentName TEXT NOT NULL,
InstrumentTypeKey INTEGER NOT NULL,
LineTypeKey INTEGER NOT NULL,
FOREIGN KEY (InstrumentTypeKey) REFERENCES Dimension_InstrumentType (InstrumentTypeKey),
FOREIGN KEY (LineTypeKey) REFERENCES Dimension_LineType (LineTypeKey))
CREATE TABLE Dimension_Indicators (
IndicatorKey INTEGER PRIMARY KEY,
IndicatorName TEXT NOT NULL,
IndicatorTypeKey INTEGER NOT NULL,
IndicatorCategoryKey INTEGER NOT NULL,
LineTypeKey INTEGER NOT NULL,
FOREIGN KEY (IndicatorTypeKey) REFERENCES Dimension_IndicatorType (IndicatorTypeKey),
FOREIGN KEY (IndicatorCategoryKey) REFERENCES Dimension_IndicatorCategory (IndicatorCategoryKey),
FOREIGN KEY (LineTypeKey) REFERENCES Dimension_LineType (LineTypeKey))
CREATE TABLE Dimension_Date (
DateKey INTEGER PRIMARY KEY,
DateString STRING NOT NULL,
IsYearStart INTEGER NOT NULL,
IsYearEnd INTEGER NOT NULL,
Year INTEGER NOT NULL,
IsQuarterStart INTEGER NOT NULL,
IsQuarterEnd INTEGER NOT NULL,
Quarter INTEGER NOT NULL,
Week INTEGER NOT NULL,
IsMonthStart INTEGER NOT NULL,
IsMonthEnd INTEGER NOT NULL,
Month INTEGER NOT NULL,
IsWeekend INTEGER NOT NULL)
CREATE TABLE Dimension_Time (
TimeKey INTEGER PRIMARY KEY,
TimeString STRING NOT NULL,
Hour INTEGER NOT NULL,
Minute INTEGER NOT NULL,
Second INTEGER NOT NULL,
IsMorning INTEGER NOT NULL,
IsAfternoon INTEGER NOT NULL,
IsEvening INTEGER NOT NULL,
IsLondonSession INTEGER NOT NULL,
IsNewYorkSession INTEGER NOT NULL,
IsSydneySession INTEGER NOT NULL,
IsTokoyoSession INTEGER NOT NULL,
IsDayLightSavings INTEGER NOT NULL)
CREATE TABLE Facts_InstrumentsData (
DateKey INTEGER NOT NULL,
TimeKey INTEGER,
GranularityKey INTEGER NOT NULL,
Open REAL,
High REAL,
Low REAL,
Close REAL,
Volume INTEGER,
InstrumentKey INTEGER NOT NULL,
FOREIGN KEY (InstrumentKey) REFERENCES Dimension_Instruments (InstrumentKey),
FOREIGN KEY (DateKey) REFERENCES Dimension_Date (DateKey),
FOREIGN KEY (TimeKey) REFERENCES Dimension_Time (TimeKey),
FOREIGN KEY (GranularityKey) REFERENCES Dimension_Granularity (GranularityKey)
)
CREATE TABLE Facts_IndicatorsData (
DateKey INTEGER,
TimeKey INTEGER,
GranularityKey INTEGER NOT NULL,
Open REAL,
High REAL,
Low REAL,
Close REAL,
Volume INTEGER,
IndicatorKey INTEGER NOT NULL,
InstrumentKey INTEGER NOT NULL,
FOREIGN KEY (IndicatorKey) REFERENCES Dimension_Indicators (IndicatorKey),
FOREIGN KEY (InstrumentKey) REFERENCES Dimension_Instruments (InstrumentKey),
FOREIGN KEY (DateKey) REFERENCES Dimension_Date (DateKey),
FOREIGN KEY (TimeKey) REFERENCES Dimension_Time (TimeKey),
FOREIGN KEY (GranularityKey) REFERENCES Dimension_Granularity (GranularityKey))
Code
If you have been following along with my series, this next bit will be a breeze, and you will have a schema ready to enrich with your data and ideas; I wanted to give you a cookie cutter that you can take away and use yourself. Remember to change the db_string to point towards the path of your database.
As a bonus, I also included some base metadata you can use to start categorising our information coming in, that is, under populate_database_schema_base().
import traceback
from Logger import log_maker
from SQLDB Import SQLDB
logger = log_maker('Schema')
class Schema(SQLDB):
def __init__(self):
super().__init__()
self.db_string = str()
def __call__(self, *args, **kwargs):
self.generate_database_schema()
self.populate_database_schema_base()
"""
Functions to generate base DOL
"""
def generate_database_schema(self):
try:
db_schema = """CREATE TABLE Dimension_Granularity (
GranularityKey INTEGER PRIMARY KEY,
Name TEXT NOT NULL,
Alias TEXT NOT NULL,
OandaAlias TEXT NOT NULL,
TimeDelta TEXT NOT NULL)
CREATE TABLE Dimension_InstrumentType (
InstrumentTypeKey INTEGER PRIMARY KEY,
InstrumentTypeName TEXT NOT NULL)
CREATE TABLE Dimension_IndicatorCategory (
IndicatorCategoryKey INTEGER PRIMARY KEY,
IndicatorCategoryName TEXT NOT NULL)
CREATE TABLE Dimension_IndicatorType (
IndicatorTypeKey INTEGER PRIMARY KEY,
IndicatorTypeName TEXT NOT NULL)
CREATE TABLE Dimension_LineType (
LineTypeKey INTEGER PRIMARY KEY,
LineName TEXT NOT NULL)
CREATE TABLE Dimension_Instruments (
InstrumentKey INTEGER PRIMARY KEY,
InstrumentName TEXT NOT NULL,
InstrumentTypeKey INTEGER NOT NULL,
LineTypeKey INTEGER NOT NULL,
FOREIGN KEY (InstrumentTypeKey) REFERENCES Dimension_InstrumentType (InstrumentTypeKey),
FOREIGN KEY (LineTypeKey) REFERENCES Dimension_LineType (LineTypeKey))
CREATE TABLE Dimension_Indicators (
IndicatorKey INTEGER PRIMARY KEY,
IndicatorName TEXT NOT NULL,
IndicatorTypeKey INTEGER NOT NULL,
IndicatorCategoryKey INTEGER NOT NULL,
LineTypeKey INTEGER NOT NULL,
FOREIGN KEY (IndicatorTypeKey) REFERENCES Dimension_IndicatorType (IndicatorTypeKey),
FOREIGN KEY (IndicatorCategoryKey) REFERENCES Dimension_IndicatorCategory (IndicatorCategoryKey),
FOREIGN KEY (LineTypeKey) REFERENCES Dimension_LineType (LineTypeKey))
CREATE TABLE Dimension_Date (
DateKey INTEGER PRIMARY KEY,
DateString STRING NOT NULL,
IsYearStart INTEGER NOT NULL,
IsYearEnd INTEGER NOT NULL,
Year INTEGER NOT NULL,
IsQuarterStart INTEGER NOT NULL,
IsQuarterEnd INTEGER NOT NULL,
Quarter INTEGER NOT NULL,
Week INTEGER NOT NULL,
IsMonthStart INTEGER NOT NULL,
IsMonthEnd INTEGER NOT NULL,
Month INTEGER NOT NULL,
IsWeekend INTEGER NOT NULL)
CREATE TABLE Dimension_Time (
TimeKey INTEGER PRIMARY KEY,
TimeString STRING NOT NULL,
Hour INTEGER NOT NULL,
Minute INTEGER NOT NULL,
Second INTEGER NOT NULL,
IsMorning INTEGER NOT NULL,
IsAfternoon INTEGER NOT NULL,
IsEvening INTEGER NOT NULL,
IsLondonSession INTEGER NOT NULL,
IsNewYorkSession INTEGER NOT NULL,
IsSydneySession INTEGER NOT NULL,
IsTokoyoSession INTEGER NOT NULL,
IsDayLightSavings INTEGER NOT NULL)
CREATE TABLE Facts_InstrumentsData (
DateKey INTEGER NOT NULL,
TimeKey INTEGER,
GranularityKey INTEGER NOT NULL,
Open REAL,
High REAL,
Low REAL,
Close REAL,
Volume INTEGER,
InstrumentKey INTEGER NOT NULL,
FOREIGN KEY (InstrumentKey) REFERENCES Dimension_Instruments (InstrumentKey),
FOREIGN KEY (DateKey) REFERENCES Dimension_Date (DateKey),
FOREIGN KEY (TimeKey) REFERENCES Dimension_Time (TimeKey),
FOREIGN KEY (GranularityKey) REFERENCES Dimension_Granularity (GranularityKey)
)
CREATE TABLE Facts_IndicatorsData (
DateKey INTEGER,
TimeKey INTEGER,
GranularityKey INTEGER NOT NULL,
Open REAL,
High REAL,
Low REAL,
Close REAL,
Volume INTEGER,
IndicatorKey INTEGER NOT NULL,
InstrumentKey INTEGER NOT NULL,
FOREIGN KEY (IndicatorKey) REFERENCES Dimension_Indicators (IndicatorKey),
FOREIGN KEY (InstrumentKey) REFERENCES Dimension_Instruments (InstrumentKey),
FOREIGN KEY (DateKey) REFERENCES Dimension_Date (DateKey),
FOREIGN KEY (TimeKey) REFERENCES Dimension_Time (TimeKey),
FOREIGN KEY (GranularityKey) REFERENCES Dimension_Granularity (GranularityKey))
"""
self.execute_query(self.db_string, db_schema)
except Exception as err_:
self.ErrorList.append(self.ErrorDetail(f"{__class__}: generate_database_schema -> {err_}\n{traceback.format_exc()}"))
def populate_database_schema_base(self):
"""
Creates the base information that is unlikely to change in the database.
- Line Types
- Instrument Types
- Indicator Types
:return:
"""
try:
granularity = """INSERT INTO Dimension_Granularity
(GranularityKey, GranularityName, GranularityAlias, GranularityTimeDelta) VALUES
(1, 'Year', '%Y', '365 days, 0:00:00'),
(3, 'Month', '%m', ''),
(4, 'Week', '%u', '7 days, 0:00:00'),
(5, 'Day', '%d', '1 day, 0:00:00'),
(6, 'Hour', '%H', '1:00:00'),
(7, 'Minute', '%M', '0:01:00'),
(8, 'Second', '%S', '0:00:10')"""
self.execute_query(self.db_string, granularity)
line_types = """INSERT INTO Dimension_LineType
(LineTypeKey, LineName) VALUES
(1, 'Line'),
(2, 'Candle'),
(3, 'Overlay'),
(4, 'Subplot') """
self.execute_query(self.db_string, line_types)
instrument_types = """INSERT INTO Dimension_InstrumentType (InstrumentTypeKey, InstrumentTypeName) VALUES
(1, 'Stock'),
(2, 'Forex'),
(3, 'Crypto'),
(4, 'Index'),
(5, 'Fund') """
self.execute_query(self.db_string, instrument_types)
indicator_categories = """INSERT INTO Dimension_IndicatorCategory
(IndicatorCategoryKey, IndicatorCategoryName) VALUES
(1, 'Leading'),
(2, 'Lagging') """
self.execute_query(self.db_string, indicator_categories)
indicator_types = """INSERT INTO Dimension_IndicatorType
(IndicatorTypeKey, IndicatorTypeName) VALUES
(1, 'Trend'),
(2, 'Momentum'),
(3, 'Volatility'),
(4, 'Volume') """
self.execute_query(self.db_string, indicator_types)
except Exception as err_:
self.ErrorList.append(self.ErrorDetail(f"{__class__}: populate_database_schema_base -> {err_}\n{traceback.format_exc()}"))
That’s all she wrote, folks. I hope you learnt some things and will use my tools; hopefully, it will help you along the way. Peace.