In the follow up to my other article: How to Improve Microsoft SQL Server (MSSQL) Database Performance, now I will do the same but for MYSQL / MariaDB databases.
When we develop a custom web application, we must test all types of scenarios. Normally the scaling of the database is a key point for quality software. In my current job, I inherited a couple of applications that suffer in performance because the previous developer did not work well at the database level. In the web ecosystem, every millisecond gained in database operations is an essential step for delivering a better user experience on the software we make.
Let’s start our journey!
The initial starting point is knowing the importance of choosing the best storage engine between MyISAM and InnoDB is a pivotal step, in this article I don’t want to focus on the difference between these two engine types, but I leave here a link to know more about this subject.
In summary about the storage engine, I tend to lean to the InnoDB for all my projects, I only use MyISAM if the table is not often written. You could have a database with a mix of InnoDB and MyISAM tables (i don’t use this too often, but it’s a possibility).
After choosing the storage engine, now is the time for talking about indexing the fields of the tables. Of course, when you create a primary key an automatic index is created… and I see a lot of projects that only work with these indexes. The problem is when the database grows to a larger number of rows/data, the time for executing SQL queries that filter data is heavily affected by the lack of database index planning.
Let’s use a test database I created for proof of concept, with two tables (authors and posts) full of dummy data, with a primary key only in the field (id) with auto_increment and InnoDB storage engine for each table, below is the structure of the tables:
Now let’s create a simple possible real-world
SELECT, joining the two tables and filtering by this criteria :
- Joining tables
- Filter by posts
datebetween the year 1979 to 2021
- Filter by authors
last_namestarting with ‘St’
- Filter by author
agebetween 18 and 40 years old
SQL TEST QUERY and the output
SELECT p.id, a.first_name, a.last_name, a.email,a.birthdate, p.title, p.description, p.content, p.dateFROM posts p
INNER JOIN authors a
ON p.author_id = a.idWHERE year(p.date) BETWEEN 1979 AND 2021
AND a.last_name LIKE 'St%'
AND (YEAR(CURDATE()) - YEAR(birthdate)) BETWEEN 18 AND 40
It took 126 ms to output 2952 rows of data.
Now we have a point to compare the optimization that will be applied in the rest of the article.
If you analyze the query used, let us focus on the
WHERE clause: there are 3 fields that are being filtered (
Note: In some projects that I inherit and work on, I still see the foreign key without indexes. This is a bad practice in database relations, causing a decresase in performance on the
JOINconnection between tables.
My rule of thumb is to analyze these fields and choose the best candidates to have an index apply to them. Don’t abuse indexing fields (especially
string types) because adding a lot of indexes could degrade performance, besides primary and foreign keys I try to use 1 to 3 index per table at the most and only if needed.
The best candidates are the fields that could have the chance of repeated data, for example, the posts
date field and authors
birthdate could have n rows with the same content date year ex: ‘
1999-MM-DD’ versus the authors
last_name field that too could have repeated values because of the nature of the content, but the field type is a
string and I avoid adding indexes to
string fields, just in last resort.
So let’s add indexes to the posts
date and authors
CREATE INDEX index_posts_date ON posts (date);
CREATE INDEX index_authors_birthdate ON authors (birthdate);
After creating the indexes run a simple SQL Command for optimizing the tables:
OPTIMIZE TABLE table_name, you can also do more than one table in one line using a comma between table names.
OPTIMIZE TABLE authors, posts;
If you want to run this command on all your database tables, without manually type all the names of the tables, use the next SQL command to apply automatically to all tables of the desired database, just replace
'your_database' for the name of the desired database.
SELECT CONCAT('OPTIMIZE TABLE ',TABLE_NAME, ';')
After running the optimize command, let’s now re-run our initial query and measure the results :
With this modification, our query took only 12.8 ms versus the initial 126ms!
It’s an improvement of 90% of speed compared with the first query, just adding indexes used on the
Now the downside of this method: the increased size of the tables!
Let’s compare the table
posts size before adding indexes: 113.6MB, and with the added indexes: 132.6 MB.
It’s an increase of 17.35% in disk space, so you have to know that indexes come with a price, the increased size gained on the table (the size varies between the number of records/indexes and field types/content).
In my experience, I rather want to gain in performance and accept that I need more disk size for the database…
Note: all of the SQL commands showed in here are perfectly safe for your data, and if you want to roll back to the previous state is easy: just delete the added indexes and re-run the optimize table commands and all your table size is back to normal, no data lost… only speed.
Well, I hope it was a good reading, and I will be happy if this helps someone.