CubeJS on Kubernetes using HELM

Deployment of cube on Kubernetes using HELM charts

4 min readFeb 18, 2024

Before getting into deployment strategy of cubejs(Data Semantic Layer) on Kubernetes using HELM charts, have look at my medium blog to get fundamental insights on data semantic layer.

Fundamentals on Data Semantic Layer | by Opstimize Icarus

Cubejs Data Modeling with Example: 😬coming soon….!

🤝Pre-requisites:
— Fundamental concepts of Semantic Layer
— Kubernetes and HELM charts
— sample DB instance to deploy the helm charts on Kubernetes environment.

Let’s kick-off.

🪄🪄🪄Please find the HELM charts for deploying the cubje components in Kubernetes environment: OpstimizeIcarus/cubejs-helm-charts-kubernetes

Components of cubjs:

Typical cubejs deployment consists of 3 components.

1️⃣One or multiple API instances — Stateless
2️⃣A Refresh Worker — Stateless
3️⃣A Cube Store cluster [Router + Workers] — Stateful

💡Note: For detailed information for each of the above components, please have look their documentation here: 🔗 https://cube.dev/docs/product/deployment

Cubejs Docker Images

Cubejs components can be deployed as docker containers. cubejs has 2 docker images which are publicly available in Docker Hub as follows:

⚡cubejs/cube: used for deploying cube-api and cube-refreshworker
⚡cubejs/cubestore: used for deploying the cubestore-router and cubestore-worker

Note: For Development mode, ⚡cubejs/cube this single docker image can be used to deploy all the 4 components of provided CUBEJS_DEV_MODE set to true

Workflow Explained

Below is the short description about how cubejs actually works.

🐳cube-api
cubeAPI is then instance which process the incoming request from the client and transfer the request to cubestore router.
It communicates with only Router and does not interact with Workers directly.
Since all the incoming load is handled by cubeApi, we can 🔀scale out cubeApi to handle the load in production ready environments.

Application always interact with cubejs for the queries via cube-api but not with any other components directly.

cubestore :
Cube Store is the purpose-built pre-aggregations storage for Cube
🐳Cube Store Router
🐳Cube Store Worker

❗❗❗❗❗❗❗❗worth the watch videos ❗❗❗❗❗❗

▶https://youtu.be/dRpJObfNFUA?si=60bhzVR8LCb-AF2S

▶https://www.youtube.com/live/JxedV_zI7W4?si=YyhNCUiUsSyDQRdS

🐳Cube Store Router
It accepts the request from cube-api and manages database metadata, builds query plans (SQL queries) and distribute the queries to multiple cube store workers.

🐳Cube Store Worker
Workers in a Cube Store cluster receive and execute subqueries from the Router and they directly interact with the underlying data source and updates the pre-aggregated data local storage or cloud-based storages.
Multiple workers nodes doesn’t interact with each other and they rely only on Router to distribute the queries.

🐳Cube Refresh worker
It acts as a“cron” job which invalidates or removes any cache data in the background in time-interval (this is something developer can define) But is not responsible for populating the updated data.
This invalidated cache/pre-aggregations are again lazily populated during the subsequent queries sent to Cubestore.

👀Example: Assuming there is a pre-aggregation data for weekly reports of quantity of shoes in warehouse and every week the warehouse is refilled.
Refresh worker can be configured to invalidate the pre-aggregated data once every week.

CubeJS Helm Charts

🚀🚀🚀Here is the link to my cubejs helm charts github repo 🚀🚀🚀OpstimizeIcarus/cubejs-helm-charts-kubernetes

As mentioned earlier, there are only 2 docker images, you might be thinking that how are we gonna deploy cube-api , cube-refresh-worker, cubestore-router and cubestore-worker in kubernetes ...! Well, ENVIRONMENT variables are something we can make use of and then tune the docker image deploy the desired components.

Cubejs provides lot of ENVIRONMENT variables and each one has it own purpose. Below are the minimal environment configurations we will be using to deploy all the components of cubejs in Kubernetes.

For more information about all the available environment variables, Kindly have a look at: 🔗 https://cube.dev/docs/reference/configuration/environment-variables

WHAT and HOW ?

In kubernetes cluster,
🚀cube-api and cube-refresh-worker as Deployments with one common PVC and service of type Cluster IP.
🚀cubestore-router and cubestore-worker as Statefull sets with one common PVC and Headless service.

🚀We are going to use two docker images from cubejs to deploy all the four components of cubejs.

Note: Checkout the README file in my GitHub repo provided above for detailed explanation on configurations and deployment.

🤠Stay tuned for upcoming releases of helm charts with improvements…!