PG Internals: Functions, Storage Mechanism, and Background Processes

Published in

Bitnine Global

4 min readDec 28, 2023

PostgreSQL is a powerful and extensible open-source relational database management system (RDBMS) that offers various features, including functions and a flexible storage mechanism.

PostgreSQL functions:

User-Defined Functions (UDFs): Using SQL or procedural languages like PL/pgSQL, PL/Python, PL/Tcl, etc., PostgreSQL enables users to define their unique functions. These functions, also known as User-Defined Functions (UDFs), can conduct calculations, manipulate data, and encapsulate complicated logic.

Aggregate Functions: To perform computations on groups of rows, PostgreSQL has a variety of built-in aggregate functions, including SUM, AVG, COUNT, MAX, MIN, and others.

Window Functions: Window functions in PostgreSQL let you do calculations across a group of rows connected to the current row. For activities like computing running totals or ranking results, these routines are helpful.

Full-Text Search Functions: PostgreSQL has robust full-text search features, including the tsvector and tsquery functions for doing intricate text searches.

Geospatial Functions: For storing and retrieving geographic data, PostgreSQL extensions like PostGIS offer geospatial functions.

PostgreSQL Storage Mechanism:

Tables and Rows: PostgreSQL stores information in tables, which are made up of rows and columns. In a table, each row denotes a record, and each column denotes a property or field of that record.

Storage Models: PostgreSQL uses a variety of storage models for distinct types of data. For instance, it uses B-trees, which are excellent for searching and range queries, for indexing data. Furthermore, it allows heap storage, which is used by default for tables without any indexes.

Indexes: PostgreSQL employs indexes to create data structures that facilitate quick searching and sorting, which speeds up data retrieval. It also supports several different index types besides B-tree, including hash indexes, GiST (Generalized Search Tree) indexes, GIN (Generalized Inverted Index) indexes, and SP-GiST (Space-Partitioned Generalized Search Tree) indexes.

Tablespaces: PostgreSQL has tablespaces, which let you choose where database items are stored on disk physically. Performance may be increased, and storage can be managed with the help of this function.

MVCC (Multi-Version Concurrency Control): PostgreSQL employs MVCC (Multi-Version Concurrency Control) to efficiently manage concurrent transactions. It enables simultaneous access to data by many transactions without interfering with one another.

TOAST (The Oversized-Attribute Storage Technique): PostgreSQL automatically employs TOAST (The Oversized-Attribute Storage Technique) to store huge data values (such as lengthy texts and binary data) effectively. To optimize storage and retrieval, TOAST ensures that huge data are stored independently from the primary table.

Write-Ahead Logging (WAL): PostgreSQL uses Write-Ahead Logging (WAL) to allow point-in-time recovery and maintain data durability. Before applying changes to the real data files, WAL logs database changes, which aids in data recovery in the event of crashes [1].

PostgreSQL’s background processes:

Fig 1: Process architecture in PostgreSQL

PostgreSQL, as a relational database management system, runs several background processes to manage various activities and protect the database’s integrity and performance. To enable proper data management, transactions, and concurrent access, these background processes cooperate. Fig.1 [2] represents the key background processes in PostgreSQL.

Autovacuum (worker processes and auto vacuum launcher): The task of automatically recovering space and improving database performance falls to Autovacuum. The process that launches the autovacuum oversees starting those operations regularly. On individual tables, autovacuum worker processes handle the actual vacuuming and upkeep.

WAL Writer (walwriter): The Write-Ahead Log (WAL) data is written to disk by the WAL Writer process. For replication and crash recovery, WAL is employed.

Checkpointer (checkpointer): The task of writing dirty data (modified data) from shared buffers to disk is carried out by the checkpointer process. It aids in minimizing the amount of time required for database recovery following a crash.

Background Writer (bgwriter): When the system is idle, the Background Writer processes overseas writing data from shared buffers to disk. Performance is enhanced and I/O load is decreased.

WalReceiver (walreceiver): The streaming replication capability includes the WalReceiver procedure. In a streaming replication system, it receives and applies Write-Ahead Log (WAL) records from the primary server.

WalSender (walsender): The Archiver process must archive the Write-Ahead Log (WAL) data to a specific archive location. Point-in-time recovery can benefit from archiving.

Archiver (archiver): The Write-Ahead Log (WAL) data must be archived in a specific archive location by the Archiver process. Point-in-time recovery can benefit from archiving.

Stats Collector (stats collector): Statistics regarding database activity, including the number of queries, index usage, and connections, are gathered by the Stats Collector process. Tools for monitoring and query optimization use this data.

These background processes work together to make sure that PostgreSQL runs correctly, effectively manages resources, and offers the functionality required to guarantee data consistency, durability, and performance. Each process has a particular function and is essential to the overall performance of the database system.

References:

[1] https://www.postgresql.org

[2] https://www.interdb.jp/pg/pgsql02.html

PG Internals: Functions, Storage Mechanism, and Background Processes

Written by Bitnine Global