How does Postgres work?

Divya Nagar
4 min readSep 22, 2017

If you are reading this article, I am assuming that you have used PostgreSQL and you want more details about it. So what is PostgreSQL? How it works and how does it manage to understand our simple English queries? Let’s first understand what is PostgreSQL and then we will see how it works.

In simple words, PostgreSQL is a system which helps you in managing your huge databases by hiding the underlying complicated details. Let’s understand it with an analogy of a car. When you drive a car, if you press the accelerator and change the gear, the speed changes. You need not bother what is happening to the engine and other systems of the car. When it comes to databases Postgres does same for us. You just write a simple English query and it gives you the desired results while hiding all the tortuous processes. It does the optimisation for you which makes your work fast.

So how does Postgres do it? Let’s take a simple example and understand it in more details. This is a simple INSERT query.

INSERT INTO table_name (col_1, col_2) VALUES(val_1, val_2)

So these are the things which happen when above query goes to Postgres.

  1. Parse
  2. Analyse & Rewrite
  3. Plan
  4. Execute

Let’s understand them one by one. To understand the entire flow, from parsing to execution, you can go through the source code links given with functions. First function which PostgreSQL executes is exec_simple_query. Inside this you can find all the functions which are being explained below.

Parse (pg_parse_query)

Initially this query goes to parser, which converts string query to a list of parse tree (RawStmt nodes). Lexical scanner breaks the query into tokens. Parser uses the grammar from gram.y and tokens generated by scanner to get the query type. It finally loads the proper structure for the query and generates the raw tree. The output for the above query will be something like this.

[ { InsertStmt: { relation: [Object], cols: [Array], selectStmt: [Object] } } ]

Analyse & Rewrite (pg_analyze_and_rewrite)

Once the raw tree is generated, if we see in exec_simple_query the next function which we encounter is pg_analyze_and_rewrite. It does some analysis on raw tree and tries to optimise it. To analyse the query it applies some transform operations on different clauses present in our query. For example if we have a WHERE clause, it calls TransformWhereClause and checks if the output value is boolean. It avails different transformations for different clauses like transformLimitClause, transformGroupClauseExpr etc.

INSERT query modifies some columns of the database. In this step Postgres creates a target list in which it stores what columns will get modified when this query executes. After the analysis and rewrite, the next thing is optimisation. Postgres has one optimiser module to optimise the queries. It uses query structure to determine the best table join order i.e. in what order should the join operations happen to make the execution faster.

Plan (pg_plan_queries)

This module generates a plan tree headed by PlannedStmt node. PlannedStmt contains the information which execution module needs to execute the query. Plan tree contains plan nodes. Plan nodes have the information about their tasks. Each node can be considered as a task which has some cost and other factors like plan_rows, plan_node_id. These factors are necessary to execute that node. Each node in plan tree has one branch. It takes input from it’s child and passes the output to it’s parent. Based on the cost in plan nodes, Postgres estimates the total query execution time. To get more insights on this, try EXPLAIN in Postgres. As you can see in the output, there are some tasks with their costs. This is the output of plan module which goes to execution module. It can also help you in profiling and understanding which part of the query is taking more time and hence optimising it.

EXPLAIN on insert-select query

Execution

After so much of work, Postgres finally has a plan to execute. Plan tree, generated by planner and optimiser goes to Postgres executor. Executor has a demand-pull mechanism. Every plan node will demand rows from its child node. Executer will keep doing this recursively until it gets the desired results.

Demand-pull Plan tree

This image denotes the plan tree which we generated in plan section. Limit plan node requires at least one row to execute. It demands that row from Sort plan node which recursively demands row from Seq Scan. While executing, Sort plan node will repetitively call Seq Scan plan node to get the rows to be sorted. When the input is exhausted, Sort plan node will keep the sorted rows so that it can deliver to Limit plan node when demanded.

For our previous insertion query, each returned row is inserted in the specified table. EXPLAIN on that insert query will give trivial results with single Result plan node.

EXPLAIN on simple insert query

So this is how Postgres executer works. For more details on update and delete query execution you can look at Postgres official manual. Queries might get complicated, parse and plan trees might get larger but process stays the same. I hope this post gave you some insights on Postgres working process. Though you might not need all these details while writing the query but it is always good to know your tools.

Resources:

  1. Following a Select Statement Through Postgres Internals
  2. Postgres manuals

--

--