Data Replication in ClickHouse (Docker Based Setup)

Kkansara
3 min readJul 9, 2024

--

Vijay Anand ChistaDATA, ClickHouse DBA Support, ClickHouse Performance

Image Source — Pexels

Data replication is the process of storing multiple copies of data to ensure system reliability and improve data availability. ClickHouse supports multi-primary replication, and it is asynchronous (eventually consistent). Every MergeTree table engine has a replicated version, and other engines don’t support data replication.

Replicated tables require Apache Zookeeper or ClickHouse keeper. Creation/Deletion of replicated tables should happen in individual nodes of the cluster. The data can be inserted and read from any nodes in the cluster. Let us look at the steps to create a replicated table in the ClickHouse cluster. I am using docker and Apache Zookeeper to demonstrate the steps involved. Ensure the Docker engine and Docker compose are installed and working.

Docker-based ClickHouse cluster.

Let us set up a 3-node ClickHouse cluster and a single-node Zookeeper. I have used the docker-compose file below for this setup.

version: ‘3’

services:

zookeeper:

image: ‘bitnami/zookeeper:latest’

networks:

- ch_replicated

environment:

- ALLOW_ANONYMOUS_LOGIN=yes

- ZOOKEEPER_CLIENT_PORT=2181

ports:

- “2182:2181”

- “2888:2888”

- “3888:3888”

clickhouse1:

image: clickhouse/clickhouse-server

ports:

- “8002:9000”

- “9123:8123”

ulimits:

nproc: 65535

nofile:

soft: 262144

hard: 262144

networks:

- ch_replicated

depends_on:

- zookeeper

clickhouse2:

image: clickhouse/clickhouse-server

ports:

- “8003:9000”

- “9124:8123”

ulimits:

nproc: 65535

nofile:

soft: 262144

hard: 262144

networks:

- ch_replicated

depends_on:

- zookeeper

clickhouse3:

image: clickhouse/clickhouse-server

ports:

- “8004:9000”

- “9125:8123”

ulimits:

nproc: 65535

nofile:

soft: 262144

hard: 262144

networks:

- ch_replicated

depends_on:

- zookeeper

networks:

ch_replicated:

driver: bridge

We are creating a bridge network (ch_replicated) for seamless networking experience across the containers. Fire up the containers using docker-compose up command. verify if all the containers are up and running.

vijayanand@chistadata:/media/vijayanand/Data/ClickHouse/ClickHouse_Replication$ sudo docker container ls

[sudo] password for vijayanand:

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

f60b856f1fb5 clickhouse/clickhouse-server “/entrypoint.sh” 5 days ago Up 58 seconds 9009/tcp, 0.0.0.0:9124->8123/tcp, :::9124->8123/tcp, 0.0.0.0:8003->9000/tcp, :::8003->9000/tcp clickhouse_replication_clickhouse2_1

a04d155475e4 clickhouse/clickhouse-server “/entrypoint.sh” 5 days ago Up 58 seconds 9009/tcp, 0.0.0.0:9123->8123/tcp, :::9123->8123/tcp, 0.0.0.0:8002->9000/tcp, :::8002->9000/tcp clickhouse_replication_clickhouse1_1

b222f8b44db8 clickhouse/clickhouse-server “/entrypoint.sh” 5 days ago Up 58 seconds 9009/tcp, 0.0.0.0:9125->8123/tcp, :::9125->8123/tcp, 0.0.0.0:8004->9000/tcp, :::8004->9000/tcp clickhouse_replication_clickhouse3_1

69527fd60266 bitnami/zookeeper:latest “/opt/bitnami/script…” 5 days ago Up 58 seconds 0.0.0.0:2888->2888/tcp, :::2888->2888/tcp, 0.0.0.0:3888->3888/tcp, :::3888->3888/tcp, 8080/tcp, 0.0.0.0:2182->2181/tcp, :::2182->2181/tcp clickhouse_replication_zookeeper_1

Configuring the ClickHouse cluster for replicated tables

Configure the following things in the config.xml

  1. Cluster config
  2. Zookeeper config
  3. Macros (optional)

Cluster Config

Add the following in the config.xml file in every node to configure the multi node ClickHouse cluster.

<remote_servers>

<replicated_cluster>

<shard>

<internal_replication>true</internal_replication>

<replica>

<host>clickhouse1</host>

<port>9000</port>

</replica>

<replica>

<host>clickhouse2</host>

<port>9000</port>

</replica>

<replica>

<host>clickhouse3</host>

<port>9000</port>

</replica>

</shard>

</replicated_cluster>

</remote_servers>

This config tells the ClickHouse server on the existence of other ClickHouse nodes. The host-name and port details are based on the details in docker-compose file.

Zookeeper Config

Add the details of the Zookeeper cluster in every node’s config file.

<zookeeper>

<node index=”1">

<host>zookeeper</host>

<port>2181</port>

</node>

</zookeeper>

Macros

Macros substitutions are commonly used while configuring the Zookeeper path in replicated tables.

<macros>

<shard>01</shard>

<replica>ch1</replica>

</macros>

The macro values must vary for every node.

Creating and testing a replicated table

Let us create a replicated table with just two columns. Repeat the creation process in every configured node.

CREATE TABLE replicated_example

(

ID UInt32,

Name String

) ENGINE = ReplicatedMergeTree(‘/clickhouse/tables/replicated_example_1’, ‘{replica}’)

ORDER BY (ID)

The table creation syntax is similar to the non-replicated MergeTree counterpart. Along with the regular parameters, we will have to provide the Zookeeper path (Znode path where the replication info will be stored) and the replica information. The {replica} is substituted from the macro which we configured earlier.

Once the table creation is successful, try inserting some data in the table from one node. the data will be available automatically in the rest of the configured nodes.

Conclusion

  • Replicated versions of all the MergeTree family engines are available in ClickHouse.
  • Replication requires a Zookeeper/ClickHouse keeper.
  • DDL commands needs to be run in every node (CREATE, DROP)
  • DML and DQL can be run in any one of the nodes (ALTER, INSERT, and SELECT)

References

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/

https://medium.com/@merticariug/distributed-clickhouse-configuration-d412c211687c

--

--