PostgreSQL — Accurately Benchmarking Features Which Take Very Little Time

Published in

The Opinionated Software Architect

5 min readSep 20, 2022

What does it take to measure the difference between two functions in PostgreSQL? Can we get accurate execution times within hundreds of nanoseconds?

The Opinionated TLDR

The Opinionated Architect’s TLDR for benchmarking specific PostgreSQL features*:

Run benchmarking on a dedicated PostgreSQL server to minimize standard deviation. When benchmarking locally, running PostgreSQL in Docker reduces variance (see findings below).
Using generate_series(1,3e6) (generating three million rows) leads to the lowest standard deviation (see findings below).
Try it yourself! See benchmark gist for benchmarking code.
For schema specific benchmarking, try a tool like pgbench.

* Our current findings to minimize standard deviations are for SELECT statements. We will also look into INSERTS and UPDATES at a later date.

Introduction

Measuring the performance of a PostgreSQL function that takes a tiny amount of time to execute (hundreds of nanoseconds) can be difficult. Even on a dedicated system, we can see a significant variation in execution time.

This article intends to cover how we measure the overhead of features like:

defaults vs. no defaults on columns
different uuid defaults
bigint vs. int as a primary key column

NOTE:

All examples in this article use Idempotent SQL DDL.
All values come from running on specific hardware. We recommend running benchmarks on your particular hardware before making any design decisions.
Consider using a tool like pgbench to benchmark a specific schema design.

The Approach

The overhead to execute a SQL statement is significant relative to the time a given feature may take (hundreds of nanoseconds). We use the generate series function to maximize the number of runs for a given feature.

For example, to test the difference between generating a v1 uuid and v4 uuid, we run the following SQL statements and compare how long each took.

SELECT uuid_generate_v1()
FROM generate_series(1,3e6) AS series;SELECT uuid_generate_v4()
FROM generate_series(1,3e6) AS series;

Running each SQL statement causes uuid_generate_*() to execute three million times. Along with the uuid_generate_*(), we also have the additional overhead of calling generate_series().

We are not trying to determine how long a given feature takes to execute (greatly affected by the compute). We do want to determine the relative overhead between two different features. Using generate_series() in both SQL statements cancels out any overhead caused by generate_series().

Benchmarking and PostgreSQL

The data in this article came from running PostgreSQL 14.5 (Debian 14.5–1) in a Docker container with deploy.resources.cpus set to 3 and deploy.resources.memory set to 10G on a Macbook Pro M1 Max and 64GB ram.

Environment

We want to use an environment with negligible standard deviation. We tested in three different environments.

Hosted — Hosted service Free Tier (PostgreSQL 14.1)
Docker — Docker on Mac M1 Max 64GB (PostgreSQL Debian 14.5–1)
Native — Native on Mac M1 Max 64GB (PostgreSQL 14.5 Homebrew)

The values below come from running SELECT series FROM generate_series(1, 5e6) 200 separate times (and not all at once) for a total of 1e9 rows selected.

Measurement         |    Hosted     |    Docker     |     Native
--------------------------------------------------------------------
STDDEV Time Per Row | 9.107 ns      | 1.870 ns      | 2.225 ns
AVG Time Per Row    | 585.5 ns      | 108.2 ns      |  92.4 ns
MIN Time Per Row    | 572.9 ns      | 102.0 ns      |  90.9 ns
MAX Time Per Row    | 616.9 ns      | 117.0 ns      | 105.6 nsBenchmarks Truncated (upper/lower 10th percentile)
--------------------------------------------------------------------
STDDEV Time Per Row | 8.683 ns      | 0.308 ns      | 0.399 ns
AVG Time Per Row    | 585.4 ns      | 107.7 ns      | 91.8 ns
MIN Time Per Row    | 574.9 ns      | 107.2 ns      | 91.0 ns
MAX Time Per Row    | 611.7 ns      | 109.1 ns      | 93.9 ns

Note: The nanosecond times provided are for information purposes only. Our intent isn’t to compare different services, nor does our benchmark SQL represents real-world usage. Our goal is to minimize the standard deviation.

Environment Findings

Some findings while running PostgreSQL 14.5 (Debian 14.5–1):

Running user applications while also running benchmarking scripts leads to a more significant standard deviation (duh).
Running in a Docker container, as opposed to natively on the Mac, was slower (108.2 ns vs. 92.4 ns) but had a lower standard deviation (1.870 vs. 2.225).
Running in the hosted environment had the most significant deviation in data (most likely not a dedicated instance).

Generate Series Findings

The following findings came from running generate series on PostgreSQL in Docker on a Mac M1 Max 64GB (PostgreSQL Debian 14.5–1). Benchmark values in the table below are truncated (upper and lower 10th percentile).

Stop Row            |    STDDEV     | AVG (Per Row)
--------------------------------------------------------------------
1e1 rows            | 1423 ns       | 4123 ns
1e2 rows            | 163 ns        | 468.5 ns
1e3 rows            | 35 ns         | 189.9 ns
1e4 rows            | 19 ns         | 123.0 ns
1e5 rows            | 14.8 ns       |  87.6 ns
1e6 rows            | 0.482 ns      | 114.9 ns
2e6 rows            | 0.423 ns      | 110.4 ns
3e6 rows            | 0.288 ns      | 108.0 ns
4e6 rows            | 0.291 ns      | 106.8 ns
5e6 rows            | 0.301 ns      | 107.5 ns
6e6 rows            | 0.309 ns      | 107.1 ns
1e7 rows            | 0.332 ns      | 105.7 ns

Running generate_series() with three million rows gives us the lowest standard deviation. Most likely, the factor affecting the results is the number of rows per commit and the amount of memory available (10GB of ram for PostgreSQL). Having less ram may result in a different outcome.

Reproducing The Findings

Benchmark gist contains the benchmark schema used to run the following SQL:

-- For each row of data in the table above, we changed
-- d_row_count and manually ran the following sql 10 times
DO $$
DECLARE d_exec_count int = 20;
DECLARE d_row_count int = 3e6;
BEGIN
    CALL benchmark.execute(
      FORMAT(
        'series - %s, row %s',
        d_exec_count,
        d_row_count
      ),
      '',
       d_row_count,
      FORMAT(
        'SELECT series
        FROM generate_series(%s,%s) AS series;',
        1,
        d_row_count
      ),
      '',
      d_exec_count
    );
    COMMIT;
END;
$$;

The results are available running:

SELECT *
FROM benchmark.compare_outcome
UNION ALL
SELECT *
FROM benchmark.compare_outcome_threashold;

Conclusion

It is possible to get reliable tens of nanosecond measurements in PostgreSQL following the process described in this article. We found that the best outcome comes from running generate_series() with three million rows in PostgreSQL running in a docker container.