☁️Introduction to ClickHouse and Huawei Cloud MRS ClickHouse Solution

Elif Meriç
Huawei Developers
Published in
7 min readNov 27, 2023
ClickHouse and Huawei Cloud ClickHouse Solution

📜Introduction

Hi everyone! 😊 In this article, we are going to learn about ClickHouse and Huawei Cloud ClickHouse solutions. ClickHouse is a fast and scalable column-oriented database solution for online analytical processing (OLAP). It supports SQL-based query language. Huawei Cloud MapReduce Service (MRS) allows users to quickly create ClickHouse clusters, so let’s see what is ClickHouse and how to use it on Huawei Cloud easily! Enjoyable readings! ☕😊

📊 What is ClickHouse?

ClickHouse was developed by Yandex.Metrica. Today, data is increasing rapidly every second. The amount of data generated by billions of people and devices from many different sources is enormous. It is of great importance that this data is stored, processed, and analyzed in a good way. At this point, ClickHouse offers a fast and scalable database solution. It is a column-oriented database management system designed for online analytical processing (OLAP). Column-oriented architecture breaks data down into columns to achieve high compression rates and high performance. ClickHouse provides query processing and analysis on large data sets. It supports SQL-based query language. Because data is stored column-oriented, only the required columns are read and processed, which improves query performance. This means data of the same type is stored in the same column, bringing a higher compression ratio can reach 10:1, significantly reducing storage costs and read overhead, and improving query performance. With features such as parallel query processing, distributed architecture, and caching mechanisms, the ClickHouse enables high-speed data analysis.

Features of ClickHouse

Key Features of ClickHouse

1. Comprehensive DBMS Functions

Abundant functions such as DDL, DML, database/table permission control, and distributed management.

2. Columnt-oriented storage and data compression

High data compression ratio (support for LZ4 and ZSTD compression algorithms), significantly saving I/O bandwidth

3. Vectorized execution engine

Excellent performance with multi-core parallel computing, vectorized execution, and SIMD

4. Support for SQL statements

Support for standard SQL syntax and built-in analysis and statistics functions

Support for multiple indexes, such as primary key indexes and sparse indexes

5. Independent data storage

Self-managed data storage, independent from other components

Replica Mechanism

ClickHouse uses ZooKeeper and the ReplicatedMergeTree engine (of the Replicated series) to implement replication. When creating a table, you can specify a storage engine and determine whether to replicate the table.

ClickHouse

The ClickHouse replica mechanism minimizes network data transmission and synchronizes data between different data centers. Thus, it can be used to build clusters with the remote multi-active multi-DC architecture.

Service

Note: ZooKeeper is a coordination service that provides distributed data consistency and synchronization. It is used by a variety of applications, including ClickHouse, to ensure that multiple servers are working together in a coordinated manner.

ClickHouse uses ZooKeeper to store a variety of information, including:

· The topology of the ClickHouse cluster, including the addresses of all servers in the cluster

· The state of each server, including whether it is up or down

· The latest metadata for all tables in the cluster

ZooKeeper provides a number of features that make it well-suited for use with ClickHouse, including:

· High availability: ZooKeeper is designed to be highly available, so it can continue to operate even if some of its servers fail.

· Scalability: ZooKeeper can be easily scaled to handle large clusters of servers.

· Performance: ZooKeeper is a high-performance service that can handle a large number of requests per second.

ClickHouse uses ZooKeeper in a number of ways, including:

Leader election: ClickHouse uses ZooKeeper to elect a leader server in the cluster. The leader server is responsible for coordinating the activities of the other servers in the cluster.

Metadata management: ClickHouse uses ZooKeeper to store the latest metadata for all tables in the cluster. This information is used by other servers in the cluster to ensure that they have a consistent view of the data.

Failover: ClickHouse uses ZooKeeper to detect when a server has failed. When a server fails, ClickHouse can re-elect a new leader server and continue to operate.

Sharding and Distributed Table

ClickHouse uses the sharding mechanism to split data in a table to multiple nodes. The data on different nodes is unique, and you can query shard data in a distributed table. The distributed table automatically routes the query request to each shard node and aggregates the results.

ClickHouse can be used in many different applications as below in the figure.

Application Scenarios of ClickHouse

Why ClickHouse is so fast?

ClickHouse is a columnar-oriented database management system (DBMS) designed for real-time analytical processing (OLAP). It is known for its high performance and ability to handle large datasets efficiently. There are several factors that contribute to ClickHouse’s speed, include

  1. Columnar storage architecture: ClickHouse stores data in columns rather than rows, which makes it more efficient for analytical queries that only need to access a subset of the data. For example, if you are running a query that only needs to select the customer_id and order_total columns, ClickHouse does not need to read the entire row for each customer; it can simply read the two relevant columns.
Column-oriented storage

2. Data compression and encoding: ClickHouse uses a variety of compression and encoding techniques to reduce the amount of storage space required for data. This can significantly improve query performance, especially for large datasets.

3. Data Sharding and Distributed query execution: ClickHouse can distribute queries across multiple servers, which can further improve performance for large datasets. Also, the ClickHouse cluster consists of one or more shards, and each shard corresponds to one ClickHouse service node. The maximum number of shards depends on the number of nodes (one shard corresponds to only one service node).

ClickHouse Cluster Structure

4. Materialized views and aggregations: ClickHouse can precompute and store materialized views and aggregations, which can significantly improve the performance of frequently used queries.

5. Vectorized query execution: ClickHouse uses vectorized execution, which means that it operates on entire vectors of data rather than individual rows. This can significantly improve the performance of arithmetic and logical operations.

6. High-performance query engine: ClickHouse’s query engine is designed for high performance. It uses a variety of optimization techniques to improve the speed of queries.

In addition to these factors, ClickHouse is also designed to be scalable and fault-tolerant. This means that it can handle large workloads and can continue to operate even if some of its servers fail. Overall, ClickHouse is a powerful and versatile DBMS that is well-suited for OLAP workloads. Its high performance and scalability make it a popular choice for large enterprises and data-driven organizations.

How to Create a ClickHouse Cluster with HA Deployment Architecture?

In Huawei Cloud MapReduce Service, ClickHouse clusters can be easily created with highly available architecture because Huawei Cloud MRS uses the ELB-based high availability (HA) architecture. As can be seen in the figure below, when a client application requests a cluster, Elastic Load Balance (ELB) is used to distribute traffic. With the ELB polling mechanism, data is written to local tables and read from distributed tables on different nodes. Thus, data read/write load and high availability of application access are guaranteed.

HA Deployment Architecture with ELB

How to Quickly Create a ClickHouse Cluster on Huawei Cloud MapReduce Service?

Step 1: Select MapReduce Service from the Service List in the Huawei Cloud console.

Step 1

Step 2: Click the “Buy Cluster” button in the MRS console.

Step 2

Step 3: Select the region, billing mode, and cluster version and set the cluster name on the next page.

Step 3

Step 4: Select “ClickHouse Cluster” in the Component section. It involves ZooKeeper and ClickHouse. After that, choose the Availability Zone and Enterprise Project, VPC, and Subnet.

Step 4

Step 5: Complete cluster node configurations and click on the Buy Now button to create the cluster.

Step 5

Step 6: Define the password, and choose whether to enable Kerberos Authentication or not. If you enable it, common users cannot use the file management and job management functions of an MRS cluster or disable Kerberos authentication after the cluster has been created.

Step 6

References

--

--