Containerizing kdb+ with Docker

Aidan O'Gorman
5 min readNov 14, 2021

--

Introduction

In this post, I create a container image for a simple kdb+ process and deploy it using Docker, then deploy replicated instances using Docker swarm. It’s the first post about how open-source tools can be used to make kdb+ conform to modern standards in software design and architecture.

kdb+ is a column-based relational time series database with in-memory abilities and is used widely in the fintech and banking sector due to its real-time data processing capabilities. The language consists of a SQL-like query language called ‘qSQL’, and a concise, expressive programming language called ‘q’. It is interpreted, dynamically typed, table (and column) oriented, and expressions are evaluated in right-to-left order.

The initial learning curve is steep, but once you get your head around the terse syntax, concise error messages and overloaded glyphs, you can do some powerful things very quickly. One of the downsides is the decision by the creators of the language to make it proprietary. Commercial licenses are prohibitively expensive for many and this has probably contributed to a lack of adoption in industries outside fintech and investment banking. As a result, there are very few frameworks or universal kdb+ tick solutions, and many kdb+ stacks are built bespoke and maintained by dedicated in-house teams.

I developed an interest in how open-source tools could be leveraged with a bespoke application to achieve some of the features that often ship as standard with enterprise grade software solutions written in other languages:

  • Fault tolerance/resiliency
  • Disaster recovery
  • Scalability in response to variable workloads
  • Zero-downtime upgrades
  • Infrastructure and operations as code

For the last few months, I’ve been focusing on a couple of technologies which can help solve some of the above issues:

  1. Docker for containerization of application code
  2. Kubernetes for container orchestration, automated deployment, and application scaling

It should be noted at this point that Docker and Kubernetes are two distinct but complementary technologies, and there are plenty of articles devoted to the differences between the two. In this post, I demonstrate how I used them to deploy a highly available (HA), scalable instance of a basic kdb+ process.

Containerizing kdb+

A container is a standard unit of software that contains all code and dependencies required to run reliably on any computing environment. Containers are built from container images; lightweight, standalone, executable packages which are OS agnostic and can therefore be deployed anywhere and will work uniformly.

A vanilla kdb+ process requires three files:

  • q.k: contains functions which are loaded as part of the ‘bootstrap’ of kdb+
  • kc.lic: license file for kdb+ (I’m using the non-commercial 64-bit license)
  • q: the q binary file
aidanog: ~/kdb_container/kdb $ ls -lrth
total 832K
-rw-r--r-- 1 aidanog aidanog 24K Nov 13 17:34 q.k
-rwxr-xr-x 1 aidanog aidanog 797K Nov 13 17:34 q
-rw-r--r-- 1 aidanog aidanog 363 Nov 13 17:34 kc.lic
-rw-r--r-- 1 aidanog aidanog 35 Nov 13 18:01 example.q

I then added a fourth file, example.q which contains a simple Hello” function.

Now that we have everything required to run q locally, we need a way to recreate these conditions procedurally; this is where the ‘Dockerfile’ comes in. A ‘Dockerfile’ contains a list of commands which are executed in sequence to build our container:

  1. Define a base image on which to build
  2. Set some environment variables required to run ‘q’
  3. Load some useful libraries like ‘rlwrap’ for enabling line-wrap on the ‘q’ REPL
  4. Copy our kdb+ files in and set the working directory
  5. Execute a command which will start a q process running and load ‘example.q’
FROM debian:9 AS base# do not clean here, its cleaned later!
RUN apt-get update \
&& apt-get -yy --option=Dpkg::options::=--force-unsafe-io upgrade
MAINTAINER Aidan O'Gorman# Set env variables for q
ENV QHOME /kdb
ENV PATH ${PATH}:${QHOME}
# This should point to the license file location
ENV QLIC /kdb
# Refresh / Update the base image using alpine's package manager "apk", and binutils to allow use of e.g. tar/ar while building
RUN apt-get -yy --option=Dpkg::options::=--force-unsafe-io --no-install-recommends install \
ca-certificates \
curl \
rlwrap \
runit \
unzip \
&& apt-get clean \
&& find /var/lib/apt/lists -type f -delete
COPY kdb /kdb
WORKDIR kdb
CMD ["q", "example.q", "-p","1234"]

Deploying kdb+ from a Container Image

Now that’s done, we simply build our container image and create our container in Docker:

aidanog: ~/kdb_container $ docker build -t kdb-in-container/1.0 .
...
[+] Building 35.2s (11/11) FINISHED
aidanog: ~/kdb_container$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
kdb-in-container/1.0 latest 536358b12c76 About a minute ago 183MB
aidanog: ~/kdb_container $ docker create kdb-in-container/1.0
e7f268521248e8078ecd8c73c74aa9dffc953b3df3726e1fe227e1210bdbb5d8
aidanog: ~/kdb_container $ docker start e7f268521248e8078ecd8c73c74aa9dffc953b3df3726e1fe227e1210bdbb5d8
doe7f268521248e8078ecd8c73c74aa9dffc953b3df3726e1fe227e1210bdbb5d8
aidanog: ~/kdb_container $ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7f268521248 kdb-in-container/1.0 "q example.q -p 1234" 6 seconds ago Up 1 second magical_cerf
aidanog: ~/kdb_container $ docker exec -it e7f268521248 sh# q
KDB+ 4.0 2021.04.26 Copyright (C) 1993-2021 Kx Systems
l64/ 4(16)core 6251MB root e7f268521248 172.17.0.2 EXPIRE 2022.08.06 aidan.ogorman@outlook.com KOD #4177517
q)
q)conn:hopen`::1234
q)conn(`hello;"Aidan")
"Hello \"Aidan\"!"

Horizontal Scaling in Docker

Now that we have an image which contains everything that is required to run our q process, we can easily create replicas of our process using a service and Docker swarm (a container orchestrator tool). I use Docker swarm here to keep things in Docker, but I prefer to use Kubernetes as a container orchestrator and will introduce it in future posts.

aidanog: ~/kdb_container $ docker swarm init
...
aidanog: ~/kdb_container $ docker service create kdb-in-container/1.0loj5lp019g8haom00foxydpr2
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
aidanog: ~/kdb_container $ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
loj5lp019g8h adoring_yalow replicated 1/1 kdb-in-container/1.0:latest
aidanog: ~/kdb_container $ docker scale service adoring_yalow=5
docker: 'scale' is not a docker command.
See 'docker --help'
aidanog: ~/kdb_container $ docker service scale adoring_yalow=5
adoring_yalow scaled to 5
overall progress: 5 out of 5 tasks
1/5: running [==================================================>]
2/5: running [==================================================>]
3/5: running [==================================================>]
4/5: running [==================================================>]
5/5: running [==================================================>]
verify: Service converged
aidanog: ~/kdb_container $ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
loj5lp019g8h adoring_yalow replicated 5/5 kdb-in-container/1.0:latest

Et voila, we now have 5 replicas of our process running on our node. If we were running a typical kdb+ tick stack with historical data services, we could easily scale the number of available services in response to varying query load throughout the day.

This is a trivial example to show how Docker can help solve issues around scalability and availability of services. There are a couple of things to note:

  • these processes are not associated with a network and have no exposed ports, so are not accessible to ingress traffic
  • all processes are deployed on a single docker swarm node — if the node goes down, all of the processes go down with it

In the next article, I will show how we can leverage Kubernetes to automate the deployment and orchestration of containers.

--

--