Practical Tips to improve MySQL query performance

Published in

NexC.co

6 min readFeb 26, 2020

Intro

At nexC we develop a state-of-the-art solution for product understanding and conversational commerce, directly to the consumers. As part of our journey, we will use this blog for sharing our technological and entrepreneurial stories.

MySQL

At NexC we split our main discussion into two parts:

The user engagement — includes the FrontEnd and the Backend for most of the actions the user does on the website.
The Backend part — the behind the scene system, all the collecting, analyzing and predicting the system do in order to prepare the data for the user part.

In this article, I will mainly discuss the user engagement part of the system. In this part we choose MySQL to be our main database management system.

MySQL is one of the most popular relational DB management systems. It runs using Structured Query Language (SQL), and therefore you can find some of the tips relevant for other systems like PostgreSQL, MariaDB, and others.

Why do we need to talk about performance?

Today, most of the startups work with frameworks that contain ORMs in order to build their MVP as fast as possible. The problem with these ORM is that usually, you give up some of your control on the queries, in order to get faster building time ( no one wants to write a query for each simple data extraction or creation).

The problem is that one day you wake up to a million rows in the tables, which directly affects your app performance. At some point, you understand that the usual more service will not result in better performance and you must dive in and solve these performance issues.

Let’s go practical

Using the right tools

There are several ways to debug your queries, the first stop is in choosing the right tools.
slow log and general log
in order to have an easy way to locate your DB issues, it’s recommended to use a query logs table. If you use RDS, or any other platform to manage your MySQL you can just add it in the setting, otherwise, you need to create these tables — you can find the SQL code in this link.
enable query logging
After you found the slow part, it’s time to dive into the specific queries.
you can enable the query log using your framework — usually something like db.enableQueryLog(), or directly in your DB:

SET global general_log = 1;
SET global log_output = 'table';-- to select them
SELECT
    *
FROM
    mysql.general_log;-- to turn off
SET global general_log = 0;

MySQL workbench

It’s an official MySQL tool for managing databases, you can find it here. If you haven’t find the right tool for planning and analyzing queries, I will definitely recommend this one. the main reason I mention this, is for their query explain tool, that can help you understand the query journey, and answer questions like — does it uses the right indexes? what is the subtable it creates, what is this join result, and so on.

And if you prefer working in a different way, you can always add “Explain” before your query to get the SQL server explanation.

example of MySQL workbench query explain — MySQL workbench execution plan

Using index

the most basic and easy solution to most of our slow query issues. before you jump to the other tips, you should definitely start with applying this one.

MySQL offers 5 types of indexes:

primary -a unique index, that no value can be null. It can’t be changed, usually used for the identification column.
unique index — all column values must be unique.
regular index — can be on one or more columns, mainly uses for finding things faster
full-text index — used for text indexing
descending index (MySQL 8+) — a regular index stored in reverse order, useful in case you need to show the most recently added data.

if you are interested in understanding how do these indexes work(usually B-trees) you can read this answer from StackOverflow.

-- regular index
CREATE INDEX index_name
ON table_name (column1, column2, ...);-- unique
CREATE UNIQUE INDEX index_name
ON table_name (column1, column2, ...);--descending index
CREATE INDEX index_name
ON table_name (column1 DESC, column2 ASC, ...);

How to choose the right index?

test your most frequent query logs or slow queries
identify the relevant columns — it can be one or more, that you are using to query by.
choose the right index — notice the little difference between the indexes, for example, if you will use unique instead of regular you will create a limitation on your DB. For those of you that use the MySQL 8+, I highly recommend testing the new index, it’s very effective for query on recent data.

Fine-tuning your queries

If you are using ORM this is the point when you get your hands dirty with some SQL ( or at least closer if you use ORM with good query builder).

select only what you need

ORM and developers usually do “select *”, this query as a price when you need to handle a big table.

let's look on the following queries:

-- select *
SELECT * FROM corpus_project.sentence_words limit 1000000;
execution time - 0.609 sec-- select specific columns
SELECT id, word_id FROM corpus_project.sentence_words limit 1000000;
execution time - 0.313 sec

Even if you read your rows in chunks, in the end, the difference remains closer to these results. So it’s simple, read only what you need, and not the whole table (this will also affect your program memory usage and performance)

join using indexes

table joining is a time-consuming action, and need to be managed carefully. If you use queries that join columns with different types, you send your SQL engine to a full table scan, the slowest option possible. Therefore it is highly recommended to verify that your joins are using indexes, and if not, try to find a way to change the query or define the right index.

Note: In most of the cases full index scan is much faster(we had improvements from 28 sec to 1.9 sec on some queries), but there are cases that the full index scan will be slower since sometimes sequential read(as used in full table scan) can be more efficient than reading the index(which is fast) and then running a lot of random queries to fetch roes by value.

Choosing the right columns

Sometimes, a simple action like choosing the right column or building the right sub-table can make a huge performance difference. Review all the columns you use in the entire query and try to stay as focus and as limited as possible.

let's look on example:

we are running this query on a 2.68M rows table.

SELECT sentence_words.word_id, count(sentence_words.id) as frequency, words.title as word_title 
FROM sentence_words join words on sentence_words.word_id=words.id
group by word_id order by frequency desc

At first look, It seems that this query should be very fast since all the columns that it uses are indexed — id (unique), word_id(index). And yet calculating the word frequency takes 27.865 seconds. We already used all the other tips so what's the next step?
By using the query explain, we discovered that although we are using indexes the query still performs a full table scan, then we make a little change that turns it from a 27-sec query to 2.531-sec query — we used the same index we are querying for — word_id in the count. by doing that the query changed from full scan to full index scan.

Using Cache

Cache is a good solution for performance issues. it can be solved at the program level, using Memcached services like Redis, or at the SQL level.

In the SQL level, you should know that they decided to deprecate it, and if you use 8+ version it is already removed.

If you choose to use a program level cache, the main challenge is finding the right way to manage the consistency between the cache and the database. Caching can be used as a middleware between your code and the DB, or for specific cases. You can also use it to store the data after some manipulations and not as raw queries. In the planning phase, make sure you take under consideration the concurrency, the update and create rate and how much precision you need from your data.

Wrapping up!

This article was about some of the best practices for MySQL performance tuning tips, we think every developer must know. It’s a good place to start for backend devs, who are uncertain about their poor database performance.

If you want to add your thoughts on the topic or want to ask some questions regarding it, feel free to write your comments or contact me.