Query Processing in PostgreSQL

Published in

Opensource Relational Database with Graph Analytics

5 min readDec 15, 2023

Introduction

As a database enthusiast and software developer, I’m captivated by PostgreSQL’s inner workings. The way it efficiently handles all client queries through a single backend process and its complex subsystems for query processing truly amazes me. I wanted to search how queries worked in PostgreSQL. I did some research and found what goes on inside the PostgreSQL code. This brief overview explains the process of query execution and how PostgreSQL do its best to ensure you get the best performance out of it. Of course, performance is greatly influenced by the way your database is designed, so consider checking these tips from AGEDB on the basics of database modeling.

Overview

PostgreSQL assigns a dedicated backend process to handle queries from each client connection, enhancing its robustness and stability. From version 9.6 onwards, it also employs multiple background workers to facilitate parallel processing of queries, leveraging the power of modern multi-core processors to improve performance. The system supports a broad range of features in compliance with the SQL:2011 standard, making its query processing subsystem one of the most sophisticated in the industry. This subsystem is methodically organized into five key areas:

Parser: The initial phase where queries are checked for syntactic validity.
Analyzer: This stage examines the query to ensure all referenced objects exist and are accessible, and it also infers the data types of all expressions.
Rewriter: At this point, any views referred to in the query are expanded and any rule-based substitutions are applied.
Planner: The planner evaluates various strategies for executing the query based on the available indexes and the estimated cost of different approaches.
Executor: Finally, the executor carries out the chosen plan, actually retrieving or modifying data.

Parser

The first subsystem in the backend process of PostgreSQL is the parser, which converts plain text SQL statements into a parse tree. The parse tree can then be processed by the other subsystems of the backend. An example of this process is provided
The parse tree will look like this for the query
testdb=# SELECT id, data FROM tbl_a WHERE id < 300 ORDER BY data;

In response, the parser generates a parse tree that organizes the elements of this query (e.g., SELECT clause, FROM clause, WHERE condition, ORDER BY clause) into a hierarchical structure that the database can efficiently analyze and execute.
It’s important to note that the parser strictly checks for syntax correctness. It does not go into the semantics of the SQL statements; this responsibility is reserved for the next step in the process, handled by the analyzer.

Analyzer

The analyzer is responsible for performing a semantic analysis of the parse tree that is generated by the parser. The output of this process is a query tree. The root of this query tree is a Query structure defined in parsenodes.h, which contains metadata about the corresponding query, such as the type of command (SELECT, INSERT, or other). Each leaf of the query tree forms a list or a tree and contains data related to a particular clause of the query.

A simple example for the query tree. The query tree generated by the analyzer in PostgreSQL contains several elements. The targetList contains the columns that will be included in the query result. In cases where the input query tree includes an asterisk, representing all columns, the analyzer replaces it with the full list of columns.

The range table holds information about the relations used in the query, such as the table OID and name. The join tree stores information about the FROM and WHERE clauses, and the SORT clause is a list of SortGroupClause.

Overall, the query tree provides metadata about the query being executed, including its type and specific clauses, which is used by the subsequent subsystems in the backend process to plan and execute the query.

Rewriter

The PostgreSQL rewriter is responsible for implementing the rule system and modifying the query tree based on the rules stored in the pg_rules system catalog. This process is carried out if it is necessary for query optimization or to enforce data integrity constraints.
A simple example of the rewriter stage:

Planner and Executor

The planner receives a query tree from the rewriter and generates a plan tree that can be processed by the executor most effectively. The plan tree is composed of plan nodes that contain information required for processing by the executor. The executor reads and writes tables and indexes in the database cluster via the buffer manager, uses some memory areas, and creates temporary files if necessary. It also maintains consistency and isolation of the running transactions using a concurrency control mechanism.

A simple plan tree and the relationship between the plan tree and the result.

Conclusion
Learning PostgreSQL’s subsystems provides a clear understanding of its query processing pipeline, essential for anyone involved in database management or application development. The system’s architecture, featuring distinct phases like parsing, analyzing, rewriting, planning, and executing, ensures efficient query handling and robust performance.

This article’s exploration of each subsystem highlights the complexity of PostgreSQL’s design, which enables advanced data management and optimization capabilities. For developers and database administrators, a deep familiarity with these processes is essential for optimizing application performance and achieving efficient data operations.

References

The Internals of PostgreSQL