What’s ClickHouse anyway ?

Bharat
4 min readAug 14, 2020

--

I’m writing this short blog post in which I want to very briefly talk about ClickHouse and explain what it is. I aim to keep this post short and potentially discuss use cases in which it shines and where one should use it or not use it. For all those who already know what ClickHouse is, you can probably skip this blog post.

If you haven’t already noticed the section heading, ClickHouse is a columnar analytics database by Yandex and is blazing fast and also is distributed. If you are interested in a few benchmarks comparing other analytics databases that are also columnar, you can check them out.

Columnar database

ClickHouse’s introduction page has really good content on what a columnar database is and how it’s different from a row based database. Nevertheless I would like to very briefly explain what it’s by using a very simple example.

Let’s consider a simple employee database which contains these fields employee_name, employee_join_date, employee_dob and employee_yoe.

Row based store

Column based store

As you can see, the row style arrangement is similar to how relational databases like Postgres, MySQL, etc organize data. Every row has an id and a set of fields. In this case, these are the employee details which are all stored next to each other.

However, in the case of columnar style of arrangement, the various fields are actually stored separately and the data from the fields are stored together as columns. For instance employee_name and employee-yoe are stored separately and the values of these fields are stored together.

What’s ClickHouse good for ?

  • When you need really high insertion rates.
  • When data written doesn’t need to be modified.
  • Data consistency is not a hard requirement.
  • Large number of rows for only a small sub-set of columns.

So, generally, ClickHouse suits all kinds of analytical uses cases. These are use cases that involve ingesting a large amount of data, processing and retrieving a small subset of fields from a larger set of fields.

What’s ClickHouse not so good for ?

  • High requirement for data consistency.
  • If data that’s inserted needs to be updated.
  • When database transactions are necessary.

Therefore, ClickHouse won’t suit if the requirement is that data that’s read should always be consistent. It also wont’ work out well if queries are of transactional nature and involve deleting or mutating the state of data. Databases like Postgres and MySQL are better suited for such scenarios.

So how Columnar data arrangement makes ClickHouse faster for analytics use-case ?

The ClickHouse introduction doc provides a really good visualization of why Columnar stores are faster for retrieving data. Nevertheless, I’ll try to explain.

Retrieval

For analytics use-case, when only a small subset of columns are needed from a larger number of columns, these columns can be read separately in chunks and this is exactly what ClickHouse does.

Let’s imagine using row based databases for analytics use-cases where you only need a few fields from a larger list of fields. In this case, all N rows and M fields in the database need to be read to return only a few fields. This is expensive and this is why ClickHouse shines when it comes to processing a large number of columns as data can be read in packets.

Insertion through MergeTree Engine

ClickHouse uses the MergeTree Database Engine (one of the main engines that it supports) which is responsible for inserting data into the table in chunks and later merges them in the background. I’ll attempt to explain MergeTree Engine in detail in another post, but basically, this is what really makes insertions so fast.

That sums up this post. You might have noticed a few links while reading through this blog post. Those links will bring you to ClickHouse’s official documentation on the topics that we just discussed. If you are interested in exploring ClickHouse for your projects, that’s a good place to start learning more about it.

You can view the original article at my blog too!

--

--