Case Study: custom scaling a kdb+ application on AWS Elastic Kubernetes Service
Introduction
Over the past 10 years, technology stacks have seen a dramatic increase in the implementation of cloud-based services as a part of their architectures: kdb+ systems are no different. Two current solutions are kdb Insights and Amazon FinSpace, both of which offer solutions for deploying kdb+ in the cloud. Kdb Insights allows users to deploy cloud-based kdb+ solutions and integrates with several common programming languages. Amazon FinSpace allows for easy migration of existing kdb+ applications to a fully managed AWS service that will run your standard kdb+ application. Whilst Amazon FinSpace allows for some customisation beyond a standard tick setup, kdb Insights allows for a much wider range of processes and custom scripts. However, what if you want full control of your application and want a highly customisable kdb+ solution that is cloud-based?
This is what prompted me to start working on this POC. I wanted to see how easily a developer could deploy a scalable and highly available kdb+ tick setup in the cloud. For this reason, I decided to use kdb+ with Kubernetes.
What is Kubernetes and what are its advantages?
Kubernetes is an orchestration tool to manage container-based applications. It is open source and is widely used across the industry and can be run on all major cloud service providers. Kubernetes is a natural fit within cloud environments because one of its core features is its ability to scale out horizontally during periods of high demand and to scale in during periods of low demand. This improves application efficiency and greatly optimises costs. Kubernetes also offers the following advantages:
• Self-healing applications that restart crashed containers ensuring higher availability.
• Automated rollouts and rollbacks — allowing for process upgrades without affecting user experience.
• High availability — containers are easy to replicate, and this can be used to provide redundancy within the application and improve availability.
• Microservices architecture — Kubernetes is very suited to using a microservices architecture, which reduces failure points in the application. Typical kdb+ architecture also follows a microservices principle, making them an ideal match.
I had prior experience with Kubernetes and AWS, so decided to use the AWS Elastic Kubernetes Service (EKS). This allows for easier creation of Kubernetes clusters in AWS, as there is no need to manage your own control plane and it integrates very nicely into other existing AWS services. The cluster was created within one region and across 2 availability zones.
In addition to this, I wanted to explore how to scale the application in a way that would be useful to a production kdb+ application. Kubernetes has in-built scaling based on memory and CPU, but these aren’t always the most useful metrics. More typically bottlenecks would occur in a kdb+ application when clients query a gateway for historical data and too many queries hit the same HDBs at the same time. Ideally these HDBs could be scaled out based on the size of the query backlog. To do this, a custom metric would be needed to scale the number of HDBs, as Kubernetes does not have this metric in-built. This was the second half of the project — implementing a kdb-specific scaling solution.
The aims of this project were therefore as follows:
· Implement a kdb+ tick setup in AWS.
· Architect it with Kubernetes to ensure it is highly scalable, available and uses microservices.
· Have custom scaling to allow the application to quickly adapt to user queries without using excess resources.
Architecture Overview
This section will cover the project architecture and the rationale behind these choices.
The core of the project is a kdb+ tick setup, with the following components:
· Feedhandler (this process simply generates the dummy data in this POC)
· Tickerplant (TP)
· Real-time database (RDB)
· Historical database (HDB)
· Gateway
· Load balancer
The feedhandler, TP, RDB and HDB processes above follow the typical function of these components within a kdb+ stack (more information can be found here https://code.kx.com/q/architecture/).
The gateway handles client queries and identifies which processes the query should target. Each RDB and HDB process is registered with the load balancer on startup and removed from the list of servers on closing connection to the load balancer. This means that the load balancer has an up-to-date record of the database processes running. This is particularly useful when scaling up HDB processes, as the load balancer automatically adds these processes to the list of available database processes, and they very quickly become available for pending queries.
Supporting Kubernetes components
Below is summary of the main Kubernetes components used to run and support this application:
· Pods: a group of one or more containers
· Deployments: a self-healing grouping of pods in which you can specify how many replicas are desired (if the number of pods is lower than what has been specified the deployment will try to spin up more).
· Services: allows for groups of pods (including deployments) to have assigned static IPs that enable easier communication between pods
· ConfigMaps: used to inject config data into pods, such as environmental variables.
· Horizontal Pod Autoscaler: allows for a group of pods (deployment in our case) to be automatically scaled out or in based on Kubernetes, custom or external metrics.
· Ingress: used to allow external traffic to access the Grafana dashboard service on port 80, the default port for http traffic. This means the user can access the dashboard via a browser using the custom URL generated by the ingress object.
Storage
Pods are normally ephemeral, so files saved on them do not persist between pod restarts. As a result, some form of persistent storage was needed to store the HDB data and other files between pod restarts. A PersistentVolume object was created to provision Elastic Block Storage (EBS) for the application. This volume was mounted to the pods via a PersistentVolumeClaim. EBS was chosen as it easily integrates with kdb+ applications, however, it is more expensive than other alternatives. For this POC it was acceptable due to the low amount of storage used (10Gb).
Simple Storage Service (S3) is perhaps more well-known than EBS and is cheaper, however, S3 alone is not a good solution for an HDB. KX go into more detail as to why this is here (https://code.kx.com/q/cloud/aws/migration/#storing-your-hdb-in-s3), but simply put it comes down to the fact that S3 uses a different structure when storing data as it is an object store, rather than a POSIX filesystem. This results in HDB data being stored different in S3 as to how it would be on disk. Various other issues result in slow performance and read throughputs, making it somewhat limited for kdb+ applications.
There are alternative solutions to store large quantities of historical data in the cloud, whilst remaining performant. Amazon FSx for Lustre is one such solution and it can be integrated with S3, although the performance of such a solution is out of the scope of this project. More information can be found at https://code.kx.com/q/cloud/aws/lustre/, but it is unclear whether the performance results on that page relate to Amazon FSx for Lustre with or without S3.
kdb+ version compatibility
Kdb+ 4.0 was used throughout this project. This application was not tested with prior versions of kdb+ due to issues in acquiring licenced prior versions.
Core kdb+ functionality summary
A simple kdb+ application was successfully created in AWS EKS. It required more work to setup than a standard kdb+ application due to the additional Kubernetes objects such as services, persistent volumes, config maps etc. However, now the application is self-healing and the number of each type of process can easily be adjusted by scaling the deployments up or down. Additionally, automated rollouts and rollbacks can be performed much more easily. Notably however, this implementation does not really make use of Kubernetes’ ability to scale applications based on user demand. The second half of this article will focus on an implementation to achieve just that to make better use of Kubernetes.
Custom Metric Scaling
The aim of the custom metric scaling was to take the number of pending queries related to historical data in the gateway and use this to scale the number of HDB pods in the application. This mimics a situation where a gateway receives multiple queries for the same HDB processes at once, but it can only process one query at once and so a build-up of pending queries occurs.
Additional components required for custom scaling
Various components were required to ensure that the HDB deployment scaled based on a custom metric within a kdb+ process.
Prometheus pod: an event storage application that is often used to track timeseries of metrics and store these for visualisation. In this case it is used to store kdb-specific metrics e.g. gateway query_queue_length and Kubernetes pod metrics such as memory use, CPU etc. These were specified via a config file.
Prometheus kdb+ adaptor (GitHub — KxSystems/prometheus-kdb-exporter: Kdb+ exporter for Prometheus): a process that acts as a connector between the gateway process and the Prometheus pod. The pod in the gateway deployment ran both a kdb+ gateway pod, as well as a Prometheus kdb+ adaptor pod. The metric query_queue_length was defined in the gateway process as the count of the number of historical client queries that had yet to have their results returned to the client. The value of this variable is published to the Prometheus kdb+ adaptor pod each time a query was serviced or received by the gateway.
Grafana pod: Grafana is a visualisation application that easily integrates to display the metrics stored in Prometheus. Within Grafana you can easily create dashboards, graphs, and other visual tools to track and display metrics for error detection, resource, and cost monitoring etc. In this application it connects to the Prometheus pod and can display any metric from that process. The dashboard is accessible from the ingress URL, as the ingress has the Grafana service running as its backend.
Prometheus adaptor (GitHub — kubernetes-sigs/prometheus-adapter: An implementation of the custom.metrics.k8s.io API using Prometheus): Allows metrics stored in the Prometheus pod to be accessed by the Kubernetes API. The Kubernetes API allows for querying and management of Kubernetes resources, such as pods, deployments, horizontal pod autoscalers etc. As a result, the custom query_queue_length metric generated by the gateway and stored in the Prometheus pod can be used to scale the number of HDB pods running. A Helm chart was used to spin up the various resources required to support this adaptor. Helm is a tool to manage Kubernetes resources and is particularly useful when many complex configurations and linked resources are needed. Helm charts are standard templates for Kubernetes resources and in this application the Prometheus Helm chart (https://artifacthub.io/packages/helm/prometheus-community/prometheus-adapter) was used to deploy the RBAC (Role Based Access Control), ClusterRoles and many more Kubernetes resources needed. Further configuration ensured it could target the Prometheus pod and extract the query_queue_length metric.
HDB horizontal pod autoscaler: this controls the size of the HDB deployment, scaling in/out based on whether the query_queue_length was greater or lower than a target value. A threshold of three pending queries was set, so that if this threshold was exceeded additional HDBs were spun up to reduce the backlog. Various configuration options are possible to limit or delay the speed at which additional pods spin up or down, but these were not used in the POC. This is the last step in the chain of components that allow the number of HDB processes to adapt dynamically to the gateway query load.
Demo of HDB Autoscaling
Figure 2 is taken from the Grafana process. To test the benefits of this setup 25 identical historical (HDB only) queries were sent in parallel to the gateway, mimicking 25 different users all trying to query one HDB process at once. The queries included a 10 second sleep timer running on the HDB process. This mimics a large query, as otherwise the queries were returned almost instantly due to the small size of the data set.
First, this was run without any autoscaling implemented. The red line indicates the point at which the gateway received the 25 queries. The first peak in the query queue length graph shows what happened when no autoscaling was applied. The queries were returned one at a time and at a near constant rate, as can be seen from the slope of the graph. This is because there was only one HDB and each query was identical, so they each took approximately the same amount of time to be resolved. There was no autoscaling enabled and the queries were returned at a set rate by the single HDB process. Clearly this result would be unacceptable for users who required fast access to their query results.
Next, the autoscaling was enabled and then 25 identical queries were resent to the gateway. The orange line in figure 2 shows when these 25 queries were sent to the gateway. This time however, autoscaling was enabled and as it can be seen the number of HDB pods increased to 4, rather than staying at 1. The subsequent time for the 25 queries to be returned to the client dropping drastically once the number of HDB processes increased, as the queries were routed across the different HDB processes, instead of just the one. Clearly this setup can aid in significantly reducing client wait times during periods of high demand by spinning up additional HDBs when required. Note that the number of HDB pods scaled back down to 1 once query_queue_length reduced to 0, meaning that these additional pods were only running when required, therefore making this both a scalable and efficient solution.
Project Evaluation
HDB spin-up speed
In figure 2 there is a clear delay before the additional HDBs spin up to process the additional queries. Whilst better than the scenario where there is only one HDB, there is definitely room for improvement of the time taken by the horizontal pod autoscaler to spin up additional HDB processes in response to the surge in queries.
The delay is due to the frequency at which the horizontal pod autoscaler samples the custom query_queue_length metric. The default sampling frequency as set in the kube-controller-manager process for a horizontal pod autoscaler is 15 seconds. Normally this could be reduced but this value cannot be customised when using AWS EKS. As a result, the response speed for new HDBs spinning up is limited in this setup as the horizontal pod autoscaler can take up to 15 seconds to recognise that the query_queue_length metric has changed. Testing the time to spin up a new HDB pod showed resulted in varying times, typically under or around 15 seconds. Clearly this will result in delays in spinning up new HDB pods and cannot provide additional pods as quickly as may be needed.
This issue highlights a disadvantage of using EKS or other managed Kubernetes services — the ease of setting up a cluster comes at the cost of being able to fully customise the cluster. Had this cluster been fully self-managed then this sampling frequency could be reduced significantly. It is possible to specify the sampling frequency in milliseconds, however, the latency of this process would need to be tested to verify the lower limit of this delay.
Security
The cluster made use of various security measures to secure the data and application. These included but were not limited to:
· use of AWS Identity and Access Management (IAM) to limit access to the cluster, resources and processes running.
· authentication on the Grafana dashboard to prevent unauthorised access.
· encryption of the EBS volumes.
· role-based access controls within the cluster.
However, implementation on a production stack would require more stringent security that would most likely not be met by this application.
Latency
The application developed is clearly not suitable for scaling applications where additional HDB applications would be needed within a nano-second period. The most obvious limitation is that of the horizontal pod autoscaler only sampling the custom query_queue_length metric every 15 seconds. Whilst this could certainly be reduced, potentially down to the order of milliseconds, there are other bottlenecks which increase the delay in scaling. When tested, the Prometheus pod process could sample the metric from the q process every 250 milliseconds but no faster. It may be possible to improve via changes to the pod resources, however, clearly it will be difficult to achieve below a millisecond period for scaling deployments.
Adapting for more mature kdb+ systems
The core of this application was a minimal vanilla tick setup, with a few additional processes added in to allow for query routing via a gateway. Of course, kdb+ systems tend to be much more complex and make use of multiple gateways, Write-only Databases (WDBs), Intraday Databases (IDBs) and much more. These would clearly add a layer of complexity, however, this POC could be extended and adapted to fit a more complex application, as the core building blocks are present and the most intricate part relates to the custom metric collection and subsequent scaling, which could remain largely unchanged.
Conclusion
This POC has demonstrated how kdb+ can be deployed on Kubernetes to enable a self-healing and scalable application, but also how the application can be customised to scale on kdb+ generated metrics. Whilst this project focused on using the length of the query queue in the gateway to scale the number of HDBs using a basic kdb+ framework, it could easily be expanded and adapted to provide a bespoke kdb+ solution.
Whilst the custom scaling was effective in reducing query backlogs, the delay in additional HDB pods spinning up was far from ideal. Implementing this solution without the use of EKS or other managed Kubernetes solutions would allow for this to be greatly reduced and would improve the viability of this implementation. However, it seems using this for very low-latency applications might prove difficult. Instead, it may be better suited for applications where users expect large queries to be resolved quickly, but where nano or millisecond latency is not critical.
Other areas to look at such as improving security and exploring other storage solutions were also discussed and potential issues raised. However, this project was still a comprehensive solution to implementing kdb+ in the cloud and customising the application to provide scaling functionality not normally available on a typical kdb+ stack.
Acknowledgements
GitHub — KxSystems/kdb-tick: Latest source files for kdb+tick
Architecture | Documentation for q and kdb+ — Kdb+ and q documentation (kx.com)
About the Author:
Eman Santiano is a kdb+ Engineer at Version 1.