How to create candlestick charts with PostgreSQL
I’ve been exploring databases for the whole of my career, and yet every day I find something that amazes me. PostgreSQL introduced window functions in the 8.4 version (6 years ago!) but I still think it’s a feature not many people are aware of.
Let’s see how window functions work in a real-life example!
The Problem: OHLC Stock ChartF
You have to generate data for a stock charting site. The charting library expects data aggregated into the OHLC format and you need to support multiple intervals (for example 1min, 5min, 15min, 1h) in the given time period.
The Data
You have a table containing one-minute OHLC bars for stocks 30-days back. The table schema looks like this:
CREATE TABLE minute_bars
(
date timestamp with time zone,
open numeric,
low numeric,
high numeric,
close numeric,
volume bigint,
key character varying(255),
id serial NOT NULL,
CONSTRAINT minute_bars_pkey PRIMARY KEY (id)
);CREATE UNIQUE INDEX index_minute_bars_on_key_and_date
ON minute_bars
USING btree
(key COLLATE pg_catalog."default", date);
and it currently contains around 7 million rows of data:
date | open | low | high | close | volume | key | id
------------------------+----------+----------+---------+---------+--------+-------+--------
2015-06-24 21:50:00+02 | 128.06 | 128.06 | 128.13 | 128.12 | 1241 | AAPL | 7812127
2015-06-24 21:51:00+02 | 128.09 | 128.05 | 128.09 | 128.05 | 1244 | AAPL | 7810150
2015-06-24 21:52:00+02 | 128.04 | 128.04 | 128.08 | 128.07 | 1221 | AAPL | 7808375
2015-06-24 21:53:00+02 | 128.00 | 128.00 | 128.05 | 128.05 | 800 | AAPL | 7806107
2015-06-24 21:54:00+02 | 128.09 | 127.98 | 128.09 | 127.98 | 1000 | AAPL | 7804135
Aggregating with window functions
First you need to somehow establish the boundaries for interval periods in the given time range. You can use awesome generate_series function that’s built in PostgreSQL.
For example:
with intervals as (
select start, start + interval '5min' as end
from generate_series('2015-06-11 9:30', '2015-06-11 16:00', interval '5min') as start)
select * from intervals;
will return what we want:
start | end
------------------------+------------------------
2015-06-11 09:30:00+02 | 2015-06-11 09:35:00+02
2015-06-11 09:35:00+02 | 2015-06-11 09:40:00+02
2015-06-11 09:40:00+02 | 2015-06-11 09:45:00+02
2015-06-11 09:45:00+02 | 2015-06-11 09:50:00+02
Now that you have boundaries that establish a set of rows, you can dive into window functions and aggregates.
with intervals as (
select start, start + interval '5min' as end
from generate_series('2015-06-11 9:30', '2015-06-11 16:00', interval '5min') as start
)
select distinct
intervals.start as date,
min(low) over w as low,
max(high) over w as high,
first_value(open) over w as open,
last_value(close) over w as close,
sum(volume) over w as volume
from
intervals
join minute_bars mb on
mb.key = 'AAPL' and
mb.date >= intervals.start and
mb.date < intervals.end
window w as (partition by intervals.start order by mb.date asc rows between unbounded preceding and unbounded following)
order by intervals.start
The aggregate calculations are performed over window
of data that is determined with partition by intervals.start
part of the query. In this case the tricky thing is to get correct values for open
and close
attributes. By default first_value and last_value functions are operating on window frame
corresponding to the current row (from the start of the partition through the last peer).
Fortunately you can modify this behavior using rows between unbounded preceding
and unbounded following
modifier that tells PostgreSQL to use all the rows in the partition when calculating open and close values.
In the end you can query for different intervals and time ranges to get results fast!
date | low | high | open | close | volume
---------------------+-------+---------+---------+---------+--------
2015-05-29 09:30:00 | 131.1 | 131.44 | 131.26 | 131.17 | 63132
2015-05-29 09:35:00 | 131.1 | 131.38 | 131.18 | 131.25 | 69550Time: 75.079 ms
Do you have other ideas where window functions can be applied? The comments are open.