Clickhouse DB in Python

Harris Minhas
Topick
Published in
3 min readJul 26, 2023

Introduction:

In this article, you are going to learn:

  1. What is clickhouse DB?
  2. How to set up clickhouse server locally and query using various clients.
  3. How to connect to a clickhouse DB using python and make queries from it.

Bonus

  1. Useful VSCode extensions to help you work with clickhouse and python.

What is Clickhouse?

Clickhouse is a high performance, column oriented SQL DBMS for OLAP workflows. A data-engineer’s dream come true.

What is OLAP workflow?

OLAP stands for online analytical processing. OLAP workflows are those that require real-time responses on top of large datasets for analytical purposes.

Lets set it up:

Clickhouse runs on macOS, FreeBSD and Linux while it can also be supported on Windows using WSL.

Installation:

You can install clickhouse server on your system locally using the following command on terminal:

curl https://clickhouse.com/ | sh

Setting up clients:

There are a number of ways to set up clients to run queries to your clickhouse DB. We will discuss the top three here:

  1. Clickhouse server provides an http based client for users to run queries from an easy to use UI based setup. It can be accessed by http://localhost:8123/play. It looks like the following

2. We can also set up our own SQL based client by running the following command on terminal:

./clickhouse client

3. If you are a fan of VSCode, there is a very useful extension to get you started with clickhouse. SQLTools can be used to establish connections to the clickhouse server and run SQL queries on them.

SQLTools extension for clickhouse

4. You can also use tabix for the very same purpose if you are a fan of dockerized services.

docker build -t tabix .
docker run -d -p 8080:80 tabix

Lets connect to Clickhouse using python:

Python has a number of libraries that can be used to connect to clickhouse. The most convenient of them to use is an orm by the name of infi.clickhouse-orm. You can install it using the following command:

pip3 install infi.clickhouse-orm

After installation, its time to connect to the clickhouse server and start querying.

from infi.clickhouse_orm import Database, Model, DateTimeField, UInt16Field, Float32Field, Memory, F

#Create table in clickhouse
class CPUStats(Model):
timestamp = DateTimeField()
cpu_id = UInt16Field()
cpu_percent = Float32Field()
engine = Memory()

db = Database('demo')
db.create_table(CPUStats)


# Calculate what percentage of the time CPU 1 was over 95% busy
queryset = CPUStats.objects_in(db)
total = queryset.filter(CPUStats.cpu_id == 1).count()
busy = queryset.filter(CPUStats.cpu_id == 1, CPUStats.cpu_percent > 95).count()
print('CPU 1 was busy {:.2f}% of the time'.format(busy * 100.0 / total))

# Calculate the average usage per CPU
for row in queryset.aggregate(CPUStats.cpu_id, average=F.avg(CPUStats.cpu_percent)):
print('CPU {row.cpu_id}: {row.average:.2f}%'.format(row=row))

Software Engineers usually prefer ORMs while dealing with databases as they provide them with an easy way to develop maintainable code. You can also run individual queries on clickhouse using python with clickhouse-driver, but that is a topic for another day.

Conclusion:

This article has walked you through the practical steps by which and through which you can set up and work with clickhouse db with and without python. In the next article, we will discuss the differences of syntax between normal SQL and clickhouse based SQL. Stay tuned.

References:

Official documentation for clickhouse: https://clickhouse.com/docs/en/intro

Official documentation for infi orm:https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/toc.md

--

--