Types of indexes and optimizing queries with indexes in PostgreSQL
PostgreSQL is a scalable database management system that can handle complex queries and large databases. In this article, we’ll explore using indexes in PostgreSQL to execute queries faster.
Indexes are data structures that improve the speed of data retrieval from databases. PostgreSQL uses a tree-based data structure called B-tree (balanced tree) to organize the index. Each index entry contains a key value and a pointer to the corresponding table row. When a query is executed that involves a condition on an indexed column, the PostgreSQL query planner can use the B-tree index to search for matching rows more efficiently than scanning the entire table.
There are different types of indexes in PostgreSQL that can be used depending on the type of data being indexed and the type of query being executed.
It’s important to choose the right type of index for your specific use case to get the best performance benefits. The most common types are B-tree indexes, hash indexes, and GiST indexes.
B-tree indexes are the default type of index used in PostgreSQL. They work well for indexing sorted data or data that can be sorted in a particular order such as dates, numbers, and text.
Hash indexes create a hash table that maps values to their corresponding rows. Hash indexes are based on the result of running each key through a hash function. The hash function calculates a fixed-length value, called a hash code, for each key. The hash code is used to locate the actual data in the index. Because hash functions are designed to be fast, hash indexes can be very efficient for certain types of queries. However, hash indexes are not useful for range queries, since they don’t maintain any particular order of the indexed values.
GiST indexes, which stands for GiST stands for Generalized Search Tree indexes, are a more complex type of index that can be used to index spatial data, full-text search, and other types of complex data. It can handle a wide variety of data types by using specialized search algorithms that are tailored to the data type being indexed. This makes it a very powerful tool for optimizing complex queries that involve non-standard data types.
How to Create Indexes in PostgreSQL
The syntax to create an index in PostgreSQL is as follows:
CREATE INDEX index_name
ON tablename (columnname);
For example, to create a B-tree index on the “created_at” column of the “orders” table, you can use the following statement:
CREATE INDEX ordersid ON orders (orders_id)
This will create an index named “ordersid” on the “orders_id” column of the “orders” table.
It is also possible to create indexes on multiple columns by listing them in parentheses separated by commas as demonstrated in the following statement:
CREATE INDEX orders_customer_id_ordered_at ON orders (customer_id, ordered_at)
This will create an index named “orders_customer_id_ordered_at” on the “customer_id” and “ordered_at” columns of the “orders” table.
How to Use Indexes in PostgreSQL
Once an index has been created, it can be used to speed up queries that reference the indexed columns. The query optimizer uses the index to quickly locate the required rows by scanning the index instead of scanning the entire table. To do this, the query should include the indexed columns in the WHERE clause, JOIN conditions, and ORDER BY clauses. It’s also important to avoid using functions or calculations on indexed columns, as this can prevent the optimizer from using the index.
For example, to query the “orders” table using the “created_at” column, you can use the following statement:
SELECT *
FROM orders
WHERE orders_id = ‘20101’ AND orders_id = ‘10131’
This will return all details of orders numbered 10131 and 20101. If there’s an index on the “orders_id “ column, PostgreSQL will use it to speed up the query by scanning only the relevant index entries instead of scanning the entire table.
In some cases, PostgreSQL may not be able to use an index even if one exists. This can happen if the query involves complex expressions or functions that cannot be matched to the index’s key columns. In such cases, you may need to rewrite the query or create a new index that matches the query’s requirements.
Tips for Optimizing Queries with Different Types of Indexes
Here are some tips to optimize queries with indexes in PostgreSQL:
1. Index the columns used in WHERE, JOIN, and ORDER BY clauses: Indexing the columns used in WHERE, JOIN, and ORDER BY clauses is vital for query performance. This is because these clauses are used to filter, sort, and join the data, and without proper indexing, the database will have to perform a full table scan, which can be annoyingly slow for large tables. Creating indexes on these columns enables the database to quickly locate relevant data and reduce the amount of time needed to execute the query.
2. Use multi-column indexes for composite queries: Composite queries are those with multiple columns in the WHERE or JOIN clauses. In such cases, creating an index on each individual column may not be sufficient to optimize the query. In these cases, it is more effective to create a multi-column index that includes all the columns used in the query. This allows the database to quickly locate the relevant rows by scanning the index, rather than performing a full table scan. This can significantly improve query performance, especially for queries that involve complex filtering or sorting conditions.
3. Avoid using indexes on small tables: Indexes can be highly useful for large tables, but on small tables, they can actually bog down query performance. In some cases, the cost of using an index may outweigh the benefits, especially for small tables. This is because indexes require additional disk space and maintenance overhead, and may not provide a significant improvement in query performance on small tables. Therefore, it is generally recommended to avoid creating indexes on small tables unless they are frequently used in join operations or other complex queries.
4. Use partial indexes for queries that return a subset of rows: A partial index is an index that is created on a subset of the rows in a table that satisfies a certain condition. By creating a partial index, you can reduce the size of the index and improve query performance, especially if the condition used to create the partial index is selective and matches a small subset of the rows in the table. For example, if you have a large table with a column called “status” that contains either “active” or “inactive” values, you can create a partial index on the “active” rows only. This index will be smaller than a full index on the column and can improve query performance for queries that only search for “active” rows.
5. Consider using expression indexes for complex queries: If your query involves complex expressions or functions, you can create an expression index to speed up the query. An expression index is an index that is based on the result of a function or expression, rather than a column value. By creating an index on the result of the expression or function, the database can quickly locate the relevant rows, rather than having to evaluate the expression or function for each row in the table. This can be very useful for queries that involve complex calculations or string manipulations.
6. Regularly analyze and vacuum your indexes: Over time, indexes can become fragmented or outdated, which can slow down query performance. To prevent this, you should regularly analyze and vacuum your indexes. Analyzing an index updates statistics about the distribution of data in the index, while vacuuming removes dead rows and frees up space in the index.
The suitable types of indices that can be used on the given table depend on the query patterns and the size of the table. Here are some examples:
- B-tree index and partial B-tree index on “final_date” timestamp without time zone column for queries that filter or sort based on this column and use only recent dates.
CREATE INDEX final_dt ON m_court(final_date);
Example query:
SELECT * FROM m_court WHERE final_date > ‘2022–01–01’;
2. Hash indexes store a 32-bit hash code derived from the value of the indexed column. Hence, such indexes can only handle simple equality comparisons. The query planner will consider using a hash index whenever an indexed column is involved in a comparison using the equal operator. Hash index on “file_no” character varying column if this column is frequently used in equality queries and has a low number of distinct values.
Example queries:
CREATE INDEX
name
ON m_court USING HASH(file_no);SELECT * FROM m_court WHERE file_no = ‘20220101’;
3. GIN index on “gist_appeal” character varying column for queries that use full-text search or search for specific keywords.
Example query:
SELECT * FROM m_court WHERE to_tsvector(‘english’, gist_appeal) @@ to_tsquery(‘english’, ‘some_keyword’);
4. Multi-column index on “petn_name” and “petn_addr” character varying columns for queries that filter or sort based on both columns. Example query:
SELECT * FROM m_court WHERE petn_name = ‘smith’ AND petn_addr = ‘downingstreet’;
When the above select queries are run after indexing, the performance of the query improves multifold. It can be seen that the query “SELECT * FROM m_court WHERE final_date > ‘2022–01–01’;” takes 0.347 seconds to retrieve the data of 370 rows from a table size of 20000+ rows.
Now we add the index for the column ‘final_date’ using the command ‘CREATE INDEX final_dt ON m_court(final_date);’ and then we run the select command which we ran before indexing.
It can be observed that the same 370 rows of data have been retrieved in 0.082 seconds which equates to a performance improvement of more than 4 times. This 4x performance improvement translates to a lot in terms of time saved when the table is huge and the data to be retrieved is spread over a large number of rows.
Conclusion
Optimizing queries with indexes is a crucial aspect of PostgreSQL performance tuning. By creating indexes on the right columns, using multi-column and partial indexes where appropriate, and regularly analyzing and vacuuming your indexes, you can significantly improve query performance and reduce the time needed to execute complex queries. As with any optimization, it is important to test your queries before and after making changes to ensure that they are actually improving performance.
Additionally, it’s worth noting that indexes can also have a downside, as they require additional storage space and can slow down write operations, so it’s important to use them judiciously.