Scaling Clickhouse using Amazon S3 as a storage

Published in

DataDenys

4 min readSep 28, 2022

Growing data can be a problem when you don’t have enough space on disks. Clickhouse has powerful horizontal scaling capabilities, but you don’t always need to scale processing. Sometimes you just need a lot of space for table(s) while you’re ok with your CPU & RAM. Instead of buying more disks you can consider using S3 for storing all or part of your data.

Working with text files on S3

First of all, I should mention S3 function of Clickhouse which allows reading data from CSV/TSV/Parquet (and other formats) files located directly on S3. S3 Engine allows both reading and writing data to S3 files. More on working with S3 files directly from Clickhouse.

This might fit your needs in some trivial cases, but if you want to leverage all power of MergeTree tables, you should use S3 disks.

Storing MergeTree tables on S3

Clickhouse supports S3 protocol on all levels, including storages. So, if you want to have your table being stored on S3 bucket, you start from declaring disk and policy in Clickhouse config file (I recommend adding new s3.xml file under /etc/clickhouse-server/config.d instead of changing default config):

I have already created clickhousetests bucket in eu-central-1 region, accessible via given key/secret. You should select the closest region to your Clickhouse instance for best performance. Note, that you need to specify some folder on your bucket (db in example), because Clickhouse will not start if you just use bucket root.

You should also enable caching when possible via cache_enabled, data_cache_enabled and enable_filesystem_cache as this can give dramatic improvement in query performance.

We also need to declare policiessection to make everything work. Restart Clickhouse to make sure everything is good:

sudo clickhouse restart

Creating S3 MergeTree table

Now we can create MergeTree table using S3 as a storage:

CREATE TABLE test ( `id` UInt32, `msg` String, `ts` DateTime )
ENGINE = MergeTree ORDER BY id
SETTINGS storage_policy = 's3_main'

We have used s3_main policy to let Clickhouse know we want it to store data on S3. That’s it, we can now work with this table as a standard Clickhouse table.

Inserting data

Let’s insert 10m rows and do some tests:

INSERT INTO test
SELECT number, randomPrintableASCII(10), now() - rand()
FROM numbers(10000000)

It took 16 seconds to store that amount of data:

0 rows in set. Elapsed: 16.603 sec. Processed 10.49 million rows

S3 browser shows we have db folder (named by us, automatically created by Clickhouse) takes something like 300 MB of space:

Folder itself contains Clickhouse files (which we should not touch):

Reading data

When cache is enabled, reading data can be quick if Clickhouse is able to reuse cached data. Otherwise queries can take longer time to process:

Storage strategy in real life

In production you might want to maintain Clickhouse performance, which is obviously poor when working with S3 storage. Common strategy is to use hot-cold storage approach, when recent part (hot) of your table is stored on high speed device (local SSD or NVMe) and historical part (cold) is stored on slow device (HDD or S3):

Great thing here is that Clickhouse manages this tiered storage stuff automatically. In order to use that, we just have to specify storage tiers in policies block:

<hot_cold>
  <volumes>
    <hot>
      <disk>default</disk>
    </hot>
    <cold>
      <disk>s3</disk>
    </cold>
  </volumes>
  <move_factor>0.1</move_factor>
</hot_cold>

This will ask Clickhouse to save table data on a local disk first (hot block). When available local disk space reaches move_factor*local_disk_size table data will be automatically relocated to S3 storage (cold block). We’ve used 10% movement factor which will result in data relocation once there’s less than 10% of free space left on local device.

Now (after Clickhouse server restart) we can create tables with this policy:

CREATE TABLE hot_col_table ( `id` UInt32, ... )
ENGINE = MergeTree ORDER BY id
SETTINGS storage_policy = 'hot_cold'

And we just use it in a standard way, Clickhouse will manage storage stuff automatically.

Summary

Clickhouse allows using S3 as a storage device, giving us native way to work with large MergeTree tables stored on S3. For real life cases hot-cold storage strategy can be used, when part of (usually recent) data is stored locally for performance while other part (usually historical) is moved to S3 for storage scale.

Clickhouse storage policies allow using multiple local disks to put tables on different disks which might help scale based on local devices.