Improving SQL Queries with DBeaver

A great open-source tool for daily use

Daniel Mira Agudelo
Globant
6 min readJun 26, 2024

--

Workspace for DBMS Project with MySQL — image from Easy-Peasy.AI

You hate it when your query takes too long, or at least I do.

A slow query could lead to serious application performance issues or make you spend additional time analyzing data. Join us, and let’s try to sort this out with some tips and tricks that hopefully will help you.

Now, it’s time to roll up the sleeves and get down to work. Let’s create an example table to work with. Please skip this section if you already have an example or useful database. We will need a MySQL database and DBeaver.

About DBeaver

DBeaver (community edition) is a free and open-source tool that provides support for various databases. While there is a good deal of software available for this matter, DBeaver stands out due to its useful features. One notable feature is the visual highlighting of indexes and primary keys in the columns of any SELECT query result. This is particularly helpful when the database structure is unknown, as it allows for easy identification of indexes and primary keys. You can learn more about DBeaver here.

Example table

We will use an example table named Clients created like this:

-- test_medium.clients definition
CREATE TABLE `clients` (
`Id` varchar(100) NOT NULL,
`Name` varchar(100) DEFAULT NULL,
`Address` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`Telephone` varchar(20) DEFAULT NULL,
`Country` varchar(100) DEFAULT NULL,
`City` varchar(100) DEFAULT NULL,
`ZipCode` varchar(10) DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `clients_Country_IDX` (`Country`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Example clients table columns and data types — Screen capture

It was populated with 500,000 random registers, using the Faker Python library to create a CSV file and import it to the database.

Identifying Indexes and Primary Keys

There is a way to identify the indexes and primary keys (onwards PKs) from a specific table:

SHOW INDEX FROM clients;

As a result of the previous query, you will see the following:

Show index result

But if you run a simple select query, you also will be able to identify the PKs and indexed columns:

DBeaver identifies and shows PKs and indexes

Note the ID and Country columns:

Dbeaver PK identifier
DBeaver index identifier

Those icons in the column data type mean that they are the PK and an indexed column, respectively.

And What About Performance?

You got it! DBeaver won’t improve the performance itself. Let’s review how performance can be improved or impacted depending on how we’re using some clauses:

Where

Let’s compare some timing/queries:

SELECT * FROM clients;
SELECT * FROM clients c
WHERE Country ='Colombia';
SELECT * FROM clients c
WHERE City ='Madrid';
SHOW PROFILES;

Query result:

Show profiles query result

Returning all the data took 1.779 seconds.

If we run a SELECT query with a WHERE clause filtering by Country (an indexed column), it takes 0.021 seconds to return the data. Now, if we run a query with a WHERE statement filtering by City (a non-indexed column), for example, you will see a longer execution time, in this case, 0.287 seconds, 10 times longer.

Order by

The clause ORDER BY can cause some lack of performance, especially when used with non-indexed columns:

SELECT * FROM clients c
WHERE Country ='Colombia'
ORDER BY Id;
SELECT * FROM clients c
WHERE Country ='Colombia'
ORDER BY City ;
SHOW PROFILES;

Query result:

Show profiles query result

Ordering the results by City lasts 0.031 seconds, 0.01 seconds longer. It appears to be insignificant, but when you’re dealing with millions or billions of records, it can make a difference.

Using JOINS

As we said earlier, identifying the indexes and PKs makes the difference between a fast or slow query; it’s the same here. If we use a join statement, the DBMS needs to filter and compare the data in the tables, so with non-indexed columns it will take longer than with indexed columns.

Normalization

The best practice when working with databases is to follow the principles of normalization. Starting with the third normal form (3NF) helps reduce redundancy in the data stored in the tables, which is the ultimate goal of normalization. By keeping the database simpler and avoiding duplicated information, we can maintain data consistently.

Here, DBeaver has a tool that can show you visually the Entity Relationship diagram. An Entity Relationship Diagram (ERD) visually shows how entities in an application or database are related. It provides a clear overview of their interactions and helps design the system or database structure effectively. For this, we just need to right-click the database and then click on “View Diagram”:

View Entity Relationship diagram feature

Depending on the database, it will take some time to show the diagram. Here is an example of our database:

ER diagram result

Select fewer columns

Selecting all columns in the table may take longer than selecting just the columns you need:

SELECT * FROM clients c
WHERE Country ='Colombia'
ORDER BY City;
SELECT id, Name FROM clients c
WHERE Country ='Colombia'
ORDER BY City;
SHOW PROFILES;

Query result:

Show profiles query result

Selecting all columns took 0.038 seconds, while selecting just two took 0.010 seconds. Please note that this is more relevant when using the JOIN statement because using * will return all the tables’ columns.

Limit

This clause will limit the number of records returned by the select statement. This is useful when we’re sketching complex queries to reduce a little the time spent returning data:

SELECT id, Name FROM clients c
WHERE Country ='Colombia'
ORDER BY City
LIMIT 5000;
SELECT id, Name FROM clients c
WHERE Country ='Colombia'
ORDER BY City
LIMIT 10;
SHOW PROFILES;

Query result:

Show profiles query result

Limiting the number of records to 5000 took 0.005 seconds, while limiting it to 10 took 0.004 seconds.

Conclusion

This article introduces DBeaver as a useful open-source tool for improving SQL queries. It highlights some of DBeaver’s features, such as visually showing indexes and primary keys. The article also explores various performance optimization techniques for different SQL statements and following normalization principles. We also looked at the Faker Python library to help us generate example data for our database. These tips and tricks can help users save time, effort, and cost by optimizing query response times. Ultimately, the article emphasizes the concept of “less is more” when it comes to querying databases.

References

--

--