Mastering PostgreSQL Indexes for Optimal Performance

Published in

Autodesk TLV

8 min readOct 4, 2023

In the dynamic realm of relational databases, PostgreSQL stands tall as a versatile powerhouse. When it comes to optimizing database performance, there is a crucial tool that every developer and database admin relies on — indexes. In this blog post, we’ll take a closer look at PostgreSQL indexes — a handy tool for improving database performance. From demystifying what they are to mastering their creation, understanding when to use them, and ensuring they’re finely tuned, we’ll cover it all, peppered with real code snippets and examples.

Decoding PostgreSQL Indexes

Let’s start with the basics. An index in PostgreSQL is like a map that turbocharges your queries, enabling swift data retrieval. Picture an index as the table of contents in a book — it’s your shortcut to finding exactly what you need, without flipping through every page. Without indexes, databases would slog through entire tables to fetch data, leading to inefficient query performance, especially for large datasets.

Crafting Indexes in PostgreSQL

CREATE INDEX index_name ON table_name (column_name);

Indexes can be sprinkled over one or multiple columns in a table. But, hold on! The columns you pick for indexing and the type of index you choose are pivotal.

The Index Selection Dilemma: When to Index

Indexes are strong tools, but they’re not without their trade-offs. While they boost read performance, they can slow down write operations due to maintenance overhead. The key is to choose your battles wisely. Here are scenarios where indexes shine:

Frequent Searches: If your application heavily relies on frequent searches within specific columns, indexing is the optimal choice.
Joining: Columns used in joining tables are prime candidates for indexing, ensuring faster data retrieval during joins.
Sorting & Ordering: For columns playing sorting and ordering roles, indexing steps up the game of ORDER BY and GROUP BY clauses.

Navigating Index Types

PostgreSQL spoils you with index types, each tailored to specific scenarios. Let’s explore a couple:

B-tree Index: Your go-to choice. Great for range queries and equality comparisons.

-- Create a B-tree index on a single column
CREATE INDEX idx_column_name ON table_name (column_name);

-- Create a B-tree composite index on multiple columns
CREATE INDEX idx_multi_column ON table_name (column1, column2);

Hash Index: A heavyweight for equality checks, particularly when dealing with limited distinct values.

-- Create a hash index on a single column
CREATE INDEX idx_column_name ON table_name USING hash (column_name);

Since it uses equality checks(=) If we specify a separate operator (such as pattern matching), it does not use the hash index.

GIN (Generalized Inverted Index): A power player for array data queries and full-text search.

-- Create a GIN index on an array column
CREATE INDEX idx_array_column ON table_name USING gin (array_column);

-- Create a GIN index for full-text search (requires the "pg_trgm" extension)
-- Enable the extension if not already enabled
CREATE EXTENSION IF NOT EXISTS pg_trgm; 
CREATE INDEX idx_full_text_search 
ON table_name 
USING gin (text_column gin_trgm_ops);

GiST (Generalized Search Tree): Versatile and valuable for various data types, including spatial data.

-- Create a GiST index on a geometry column (requires the "postgis" extension)
-- Enable the extension if not already enabled
CREATE EXTENSION IF NOT EXISTS postgis; 
CREATE INDEX idx_spatial_data ON table_name USING gist (geometry_column);

Please note that for some index types (e.g., GiST and GIN), you may need to enable specific extensions (like “postgis” for spatial data or “pg_trgm” for text search) before you can create indexes of that type.

Harnessing the Power of ANALYZE

The ANALYZE command is your backstage pass to index insights. It offers statistics about table distribution and index usage, helping you evaluate their effectiveness.

The ANALYZE Command:

The ANALYZE command is used to collect statistics about a table or an index. It provides the PostgreSQL query planner with information about the distribution of data within a table or index.
When you perform an ANALYZE operation on a table or an index, PostgreSQL takes a representative sample of the data and collects information about how the data is distributed.
ANALYZE is employed as part of regular database upkeep to ensure that the statistics remain current and relevant.

ANALYZE table_name;

EXPLAIN ANALYZE:

The EXPLAIN ANALYZE command is used to obtain the execution plan and performance statistics for a query without actually executing the query. It's a tool for understanding how PostgreSQL plans to execute a query and how long each part of the query takes to run.
When you run EXPLAIN ANALYZE followed by a query, PostgreSQL generates the query execution plan, executes the query, and then provides a detailed breakdown of how the query was executed, including the time spent on each step and the number of rows processed at each step. This information is valuable for query optimization and performance tuning.

EXPLAIN ANALYZE SELECT * FROM table_name WHERE column_name = 'value';

Indexing Strategies in Action

Scenario 1: Basic Indexing

Imagine you’re overseeing a table named customers, which stores valuable information about your clients. You often need to search for customers using their email addresses. Here's how it looks:

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100)
);

INSERT INTO customers (first_name, last_name, email)
VALUES
    ('Arik', 'Jordan', 'Arik@example.com'),
    ('Jane', 'Smith', 'jane@example.com'),
    ('Michael', 'Johnson', 'michael@example.com');

Without an index in place, when you want to find a specific email address, it’s like searching for a needle in a haystack. Here’s what the query would look like:

EXPLAIN ANALYZE SELECT * FROM customers WHERE email = 'Arik@example.com';

EXPLAIN ANALYZE comman’s output — without index

Now, let’s create an index on the email column to speed up these queries, and see the output of the EXPLAIN ANALYZE command:

CREATE INDEX idx_email ON customers (email);
EXPLAIN ANALYZE SELECT * FROM customers WHERE email = 'Arik@example.com';

EXPLAIN ANALYZE comman’s output — with index

Now the query plan using an index scan instead of a regular seq scan.

Wait..! but you said ‘speed up’, it is taking longer now, how is that possible?we will figure it out in last section: When to Avoid Indexing: Situations to Consider.

Scenario 2: Multi-Column Index

In a slightly different scenario, you’re tasked with managing an orders table that keeps track of customer orders. You frequently need to find orders based on two things: the customer's ID and the order date. Here's how the orders table looks:

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10, 2)
);

INSERT INTO orders (customer_id, order_date, total_amount)
VALUES
    (1, '2023-01-15', 150.00),
    (2, '2023-02-10', 200.00),
    (1, '2023-02-20', 120.00);

To make these order-based queries faster, we can create a multi-column index on both customer_id and order_date. It's like giving your database a roadmap for finding orders efficiently:

CREATE INDEX idx_customer_order_date ON orders (customer_id, order_date);

Now, your queries involving filtering by both criteria will be optimized for speed.

Scenario 3: Index Order and Included Columns

Imagine managing a products table with detailed information about various products. You frequently search for products based on their category and manufacturer. To create a covering index, including the product_name column in the index might enhance performance.

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    manufacturer VARCHAR(50),
    price DECIMAL(8, 2)
);

INSERT INTO products (product_name, category, manufacturer, price)
VALUES
    ('Laptop', 'Electronics', 'ABC Electronics', 999.99),
    ('Smartphone', 'Electronics', 'XYZ Mobile', 599.99),
    ('Chair', 'Furniture', 'Furniture Co.', 149.99);

Create an index with columns ordered by category and manufacturer, and include the product_name:

CREATE INDEX idx_product_search 
ON products (category, manufacturer) 
INCLUDE (product_name);

Multi-Column Indexes Order

Consider a scenario where you manage a database of products. The table includes several columns: product_id, category, manufacturer, price, and availability. You frequently run queries to filter products based on their category and manufacturer.

Column Order Matters:

When creating a multi-column index, the order of the columns is crucial. The leftmost column in the index plays a pivotal role. It should align with the most frequently filtered or sorted criteria in your queries.

In our case, since we often filter by category first and then by manufacturer, the optimal index order should have category as the leftmost column.

-- Correct Index Order (category, manufacturer)
CREATE INDEX idx_product_search ON products (category, manufacturer);

Now, when you run queries filtering by category and manufacturer, the index's hierarchy aligns perfectly with your query patterns, leading to efficient data retrieval.

Incorrect Index Order:

Conversely, if you create the index with an incorrect order, say, starting with manufacturer, it might not yield the same performance benefits:

-- Incorrect Index Order (manufacturer, category)
CREATE INDEX idx_incorrect_order ON products (manufacturer, category);

Queries filtering by category first would not benefit from this index as much, as it doesn't match the leftmost prefix of the index.

Querying with the Correct Index

Now, let’s see how this well-ordered index improves query performance:

SELECT * FROM products 
WHERE category = 'Electronics' AND manufacturer = 'ABC Electronics';

This query efficiently utilizes the multi-column index, benefiting from the leftmost category column.

Querying without the Correct Index

Contrast this with a query that doesn’t align with the index order:

SELECT * FROM products 
WHERE manufacturer = 'ABC Electronics' AND category = 'Electronics';

This query might not fully leverage the index’s potential, resulting in potentially slower performance.

When to Avoid Indexing: Situations to Consider

Highly Volatile Tables: Tables frequently changing with new data or updates may not benefit from indexes due to potential write performance impacts.
Small Tables: For very small tables with few rows, adding indexes may not significantly boost performance and could be unnecessary.
Low Selectivity Columns: Columns with limited distinct values might not gain much from indexing; focus on columns with higher variety.
Frequent Full Table Scans: When queries scan the entire table without leveraging indexes, the overhead of maintaining them may not be justified.
Batch Insert Operations: During large data imports, temporarily removing and then rebuilding indexes can expedite the process.
Temporary Tables: For short-lived tables rarely queried, adding indexes might not be worth the effort.
Queries with Low Complexity: Simple queries with minimal filtering may not see significant performance gains from indexes.
Low Latency Tolerance: In real-time applications, even minor indexing delays may not be acceptable, necessitating other optimization methods.

Keep in mind that whether to use or skip indexes should be based on a thorough analysis of your specific database and your performance needs. It can be beneficial to closely examine your queries, both with and without indexes, and employ the EXPLAIN ANALYZE command to determine the actual impact on performance.

Conclusion: Mastering the Art of Indexing

Becoming professional in indexing is a must skill when working with the PostgreSQL database management system. By knowing which columns to index and understanding how indexing really works, you’ll have the tools to make your queries faster and more efficient. Don’t hesitate to experiment with various types of indexes, monitor their performance, and leverage included columns to create efficient covering indexes. Armed with these practical examples, you’ll be well-prepared to enhance the efficiency of your PostgreSQL database, ensuring that your application remains responsive and fast, even as your data continues to evolve and expand.