How to generate random data for Clickhouse tables

Denys Golotiuk
DataDenys

--

When you want to do some tests you might to generate a lot of test data for your tables to understand its performance under load. Clickhouse has built-in generateRandom() function that helps quickly generate and populate data into tables of any structure.

As said in documentation, data generating is as simple as specifying list of fields you would like to generate data for:

SELECT * FROM generateRandom('uuid UUID, context Text, dt Datetime') LIMIT 100

This will generate 100 random records for a table with 3 columns:

  • UUID column
  • text column
  • datetime column

Virtual table will be printed as a result:

You can also specify maximum string length to generate for Text columns:

SELECT * FROM generateRandom('uuid UUID, context Text, dt Datetime', NULL, 100) LIMIT 100

This will limit Text values to 100 symbols:

Populate table with random data

That’s an easy one, just use INSERT...SELECT for your table and structure:

INSERT INTO sparse_8k
SELECT * FROM generateRandom('id Int64, str_1024 Text', 1, 1024) LIMIT 10000000;

This will insert 10m rows into sparse_8k table of the given structure.

--

--

Denys Golotiuk
DataDenys

Data-intensive apps engineer, tech writer, opensource contributor @ github.com/mrcrypster