Setting up your RudderStack, the Docker way

Vishwas Damle
WISE Tech
Published in
3 min readDec 28, 2021

Why RudderStack?

RudderStack is one of the best platforms for setting up a data pipeline for your engineering and data needs. It provides an easy way to produce, transform & consume events. You can gather events coming from various sources like backend systems, click-stream SDKs, event logs and then push them to numerous prebuilt destinations like S3, Datalake, Kafka, Clevertap, Mixpanel, GA and many more with basic config

RudderStack lets you host their open-source services on your infra.

It works this way:

  • Sign up on RudderStack & set up a “control plane”. Control plane holds the source, transform & destination configs. Use the workspace tokens to set up the rudder-server & rudder-transformer. These services identify your project using workspace token and use the control configs for processing.
  • rudder-server, a golang microservice that accepts events over HTTP, will be the primary API for your backend systems and click-stream SDKs
  • rudder-transformer, a node.js microservice, will be receiving requests from rudder-server to perform “transformations” configured through RudderStack dashboard

While RudderStack Documentation provides detailed information about setting up your services on Kubernetes & via docker-compose, it talks very little about building rudder-server & rudder-transformer images using Dockerfile & deploying it on container orchestration services like ECS.

This article tries to address that.

Setup

Setting up Rudder Server

  • Similar to a single instance rudder-server, each of the docker containers needs its own Postgres instance. So setting up a common Postgres server across all containers (as it would be the natural decision you would take) does not work. Each container maintains its event sequence numbers and the shared DB instance causes the numbers to clash causing the rudder server to go into an inconsistent state.
  • So, in the below Dockerfile, we create an in-container postgres server, initialize & set up db, user and password for the same. This will later be used by docker container through injected environment variables
  • Post the DB setup, it builds rudder golang app and exposes it on 8080
  • Below is the set of environment variables you need to inject in the container to make it use the right DB creds, talk to the correct rudder-transformer instance, and point to the right control plane (using workspace token)

Using these Dockerfile & env config, you build your image and then deploy it to your cluster. We use cloudlift to deploy to AWS ECS here at Wise. So for that, steps will be

  1. Clone the rudder-server repository and copy the above Dockerfile in it. Commit the code.
  2. Build the docker image with the correct tag
docker build . -t rudder-server:<commit_hash>

3. Upload and deploy to ECS using Cloudlift: (this step will change for you as per your library and cloud solution)

cloudlift upload_to_ecr --local_tag <commit_hash> # optional
cloudlift deploy_service -e <environment>

Setting up Rudder Transformer

This is very easy. This repo will have a Dockerfile and does not have any DB dependency to be taken care of. So you just need to follow the regular Docker build-upload-deploy routine.

It is recommended to not expose rudder-transformer to the outside world as transformer & server talk to each other without any authentication

Hope this helps you in quickly getting your RudderStack setup up and running!

--

--

Vishwas Damle
WISE Tech

Software Engineering | Building tech @wiseapplive | Previously @getsimpl & @ThoughtWorks | Coding. Memes. Politics. History. Philosophy