Optimisation is Key: Track Those Slow Running Queries (PostgreSQL)

Agbojo Raphael
6 min readApr 29, 2019

--

As a database administrator, one of the many things to make a priority is ensure your queries are optimized. You wouldn’t want to spend long minutes/hours running a query or queries especially when they are run frequently.

This is why optimization is germane, it is the very key thing to make your life as a Database Administrator much easier. If you are in the position of converting data into useful information for users, you don’t want to spend lengthy duration providing solutions that can be gotten faster.

The question that might be of concern to you is; how do I track these queries? Where are they logged? How can I solve the issue of them taking too long to run?

All of these questions this article attempts to answer.

What is DB Optimization?

DB Optimization is making the best use of available resources to detect slow running queries and doing the needful to make them run faster without delay.

As a Database Administrator, optimization would mean maximizing the speed and efficiency with which data is retrieved. After requesting for an information from the database — (query), it is expected that you get your desired output quicker without having to wait too long to do so.

The ultimate goal of optimization is to reduce the system resources required to fulfil a query.

Query Optimization — PostgreSQL

PostgreSQL has provided ways to track your queries. These methods will be used to check for slow running queries, and find best ways to optimize them.

N.B. The advisable PostgreSQL to be used is from 9.2 and above

In this article, I’ll be talking about two ways (2) ways to track these queries using:

1. PG_STAT_ACTIVITY

2. PG_STAT_STATEMENTS

PG_STAT_ACTIVITY

The pg_stat_activity is a view in the pg_catalog schema showing information related to the current activity of a process, such as the state, the current query, the start time of the query amongst others.

Furthermore, in this view, you see other descriptions like the database name, the username of the current user, and the process ID. The pg_stat_activity can be linked with other tables and views such as pg_class for table names, and pg_namespace for schema names.

Depending on your preference, you decide how long is enough for your queries to run to get your desired output. However, in this case, we will assume the query is running for too long if it exceeds 1 minutes (60 seconds).

To get the current running queries exceeding one minute from pg_stat_activity. Your query will look like:

To retrieve the important information from the view, you can request for specific columns that are needed for the purpose of analysis. In this case, your query will look like:

With the above query you can get the currently running queries that exceed 1 minute.

As I aforementioned, your preference for how long a query run is up to you as a Database Administrator, you can alter the query to an interval of 2 minutes, 3 minutes as you see fit.

PG_STAT_STATEMENTS

The pg_stat_statements is an extension that doesn’t come with PostgreSQL by default unlike the pg_stat_activty. The extension has to be created to have it installed in the database. The pg_stat_statements has a bit of advantage over the pg_stat_activity in this case because the pg_stat_activity only shows the output of currently running processes. After a query runs completely and provides the result, it stays in the pg_stat_activity for a few seconds and leaves.

However, in the pg_stat_statements, the query is logged for as long as you need it to. This helps to track slow running queries easily. It also provides information that are useful for the sake of analysis.

If as a Database Administrator, you feel like the pg_stat_statements has a lot of data in it after a while, because it stores every query run in the database. You can decide to clear it, we will get to this later on. Clearing the pg_stat_statements can happen once in a week, once in two weeks depending on how you deem fit. To prevent having to do it the manual way, you can create a way to effect this automatically.

Let’s Talk About How to Create the pg_stat_statements Extension

Locate your postgresql.conf file in your program files.

HINT: “C:\Program Files\PostgreSQL\9.4\data\postgresql.conf”

Open the file, search for shared_preload_libraries, in the empty string provided, put pg_stat_statements. It should look like this:

shared_preload_libraries = ‘pg_stat_statements’

Upon completion, close the postgresql.conf file and restart your PostgreSQL service. Restarting it does not affect or tamper with anything. So, it’s safe to go ahead and restart.

HINT: Go to services.msc, search for PostgreSQL, click on restart.

After all of the above has been done, open the tool you use for querying the database. In this case, we are making use of the PGAdmin SQL Editor/PSQL Console.

N.B: There are other tools that can be used to query the database using PostgreSQL such as DbVisualiser, DBeaver, DbWatch etc. This article is paying attention to PGAdmin SQL Editor/PSQL Console because they come by default when you install PostgreSQL.

Run CREATE EXTENSION pg_stat_statements in the SQL Editor. With that, you have successfully created the extension in the database.

Now you can access the pg_stat_statements with your query looking like this:

To retrieve the important information from the view, you can request for specific columns that are needed for the purpose of analysis. In this case, your query will look like:

With the above query, you can retrieve the queries that run for more than 60 seconds which is an equivalent of 1 minute. As stated earlier, how long is too long is determined by you as a Database Administrator, you can alter the query to 120 seconds, 180 seconds, an equivalent of 2 minutes, 3 minutes respectively depending on your choice/preference.

I mentioned earlier about clearing the pg_stat_statements after a while of storing queries. To do this, run the query below in your SQL Editor;

The above query clears the records in the pg_stat_statements view. You can achieve this automatically by running a batch script scheduled to run weekly/monthly etc.

So far we have looked at;

- how to track the slow running queries,

- The different places they are logged and stored.

The final look out will be on how to solve the issue of these queries taking too long to run.

To achieve this, I’ll be highlighting certain measures that can help speed up your queries.

1. Proper Indexing: Indexing is a data structure that helps speed up the data retrieval process. Proper indexing ensures quicker access to the database. Don’t forget to drop unused indexes as too many indexes also use up space.

2. Retrieve only the relevant data: Specifying the data one requires enables precision in retrieval.

3. Create a View: If your queries are using too many joined tables, it is advisable to create a view

All of these aid in tuning the database to achieve optimum performance.

REFERENCES:

Https://www.geeksforgeeks.org/query-optimization/

https://jaxenter.com/6-ways-optimize-sql-database-136448.html

https://www.postgresql.org/docs/9.2/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW

https://www.postgresql.org/docs/9.4/pgstatstatements.html

--

--