[ SQL performance Killers ] — STALE STATISTICS

An important part of my daily work is to help developers, database administrators, devops, sysadmins and project managers to identify (using Nazar.io) and fix bad SQL code. And the idea with the [SQL performance Killers] series is to share some practical sql tuning examples.

Continuing the [SQL performance killers series post #1, post #2 and post #3, #4], in this fifth post I’ll explain why you should keep database statistics updated.

When a valid SQL statement is sent to the database server for the first time, the query optimizer chooses the best execution plan for the SQL statement. There are basically two types of optimizer, Rule Based Optimizer (RBO) and Cost-based Optimizer (CBO).

RBO was the first optimization method and as the name suggests, was essentially a list of rules the database should follow to generate an execution plan.

The CBO is an evolution of the RBO optimizer and examines all of the possible plans for a SQL statement and picks the one with the lowest cost, where cost represents the estimated resource usage for a given plan. To estimate the resources needed for each execution plan, the CBO must have information about all the objects accessed in the SQL statement. This information is the database statistics.

Last week we started a consulting project where the challenge was to identify the root cause of poor performance and instability of an Oracle RDS in a very critical application. The symptoms were high memory consumption, increased write to disk , increased number of database connections and several sessions on hold. Every time the problem occurred, the team had a few minutes to start killing the holding sessions, otherwise the database instance would be "abnormally terminated".

Database instance abnormally terminated.

By monitoring with Nazar, we identified that queries from one application functionality was responsible for 63% of the total time spent on the database and we started analyzing these queries.

Top 10 most time consuming commands.

By monitoring the problem as it occurred, we could see that all the holding sessions where executing the same queries and all of them were waiting for the "DIRECT PATH WRITE TEMP" event with "HASH" segments on TEMP tablespace.

“HASH” segments on TEMP tablespace.

Analyzing the queries plans we confirmed that they were using HASH JOINS which is a memory based JOIN that is very effective when the data set fits in memory. If the data sets do not fit in memory, then it uses a lot of sort area memory and I/O to the temporary tablespace.

Digging deeper in our analysis we found that the statistics on the database were stale. One of the main tables had around 1.5 M rows while the statistics showed only 55 K rows. Then we gathered statistics for the main tables used by the queries and all HASH JOIN were substituted by NESTED LOOPS.

The optimizer was making wrong decisions based on stale statistics. It decided that the data used by the HASH JOIN would fit in memory but actually it did not fit, causing a high usage of sort area memory and temporary tablespace.

Maintaining statistics accurate ensures that the Optimizer will have the best possible source of information to determine the best execution plan.

“It is not lack of hardware,
It is not network traffic,
It is not slow front ends,
the main performance problem in the huge majority of database applications is bad SQL code.”
Joe Celko