Multi-Node Setup using Citus

Atharv Bhadange
3 min readJan 15, 2024

--

This is part 2 of my blog series; check out the first part here.

Prerequisites

  1. Understanding of AWS EC2 instances
  2. SSH protocol
  3. Docker

Start by launching two EC2 instances, designating one as the master node and the other as the worker node.

Opt for the Ubuntu image instead of Amazon Linux, and maintain the free tier defaults for instance types.

Create a key pair and download the certificate, which will be instrumental for SSH access. The security group required will be automatically generated.

Upon the successful launch of the master node, similarly, initiate another instance for the worker node. Add an inbound rule for PostgreSQL traffic to the newly created security group.

To establish a connection with the instances, utilize an SSH client, employing the downloaded key for authentication.

Next, install Docker on the instances to facilitate the execution of a Citus container. Update the system and install Docker using the following commands:

sudo apt-get update
sudo snap install docker

Start the Docker service with:

sudo snap start docker

Now, run the Citus container on the master node using:

sudo docker run -d --name master -p 5432:5432 -e POSTGRES_PASSWORD=your_password citusdata/citus:alpine

Verify the successful launch of the container by checking its status:

sudo docker ps

Repeat the same procedure on the worker node, naming the container running there as “worker.”

Adjust your application’s code to connect to the master node by modifying the host in the code snippet provided and running the server:

dsn := "host=<your-instance-ip> user=postgres password=your_password port=5432 sslmode=disable timezone=Asia/Kolkata"

Now, if you inspect the schema migrations by connecting to the running container on the master node by:

sudo docker exec -it master psql -U postgres

\dt+

You will see logs table on the master node, but not on the worker node.

Introduce some data to the logs table on the worker node using a Python script. This process may take a while due to network latency.

Proceed to shard the table by executing the following SQL query:

SELECT create_distributed_table('logs', 'id');

Designate the master node as the coordinator:

SELECT citus_set_coordinator_host('<hostname or your ip address>', 5432);

Add the Citus-worker instance as a worker node:

SELECT citus_add_node('<hostname or your ip address>', 5432);

Upon inspecting the tables on the worker instance, you will witness the successful completion of the Citus multi-node setup.

PS: I have added 2 worker instances, hence it is showing 3 nodes in above image.

Find full implementation here: https://github.com/atharv-bhadange/log-ingestor

PPS: Your corrections/improvements are welcome!

--

--