At my current company ABOUT YOU, we have to deal with huge amounts of data, especially in our databases and often we have to check out our performance and costs over and over again, to get the last piece of improvement or DB size reduction.
One of the things doing the trick in most of our cases is the right usage of indexes and the knowledge about it.
In the following article, I explain indexes based on MYSQL 5.7 (more common) with InnoDB as an engine. This can also work for MariaDB >10 or other versions of MYSQL 5.7 and greater.
What is an Index?
An index is a structure, which contains pointers (references) that define an order relation on one or many columns of the database tables. They are used to improve the speed of data retrieval operations such as searching or sorting table columns.
If a database query uses an indexed column as a criteria, the internal Database Management System (DBMS) searches within the related index pointers. In general, indexes are using balanced trees (B-Trees) for that, and without an index, the columns need to be searched sequentially which can be slower depending on the number of rows in the table.
What Structures Do Indexes Use?
MySQL supports a few different structures. Index structures are not only bound to MySQL and can be found also in different other DBMS such as PostgreSQL, DB2, Oracle and so on and so forth. The most common is the B-Tree.
What is a Balanced Tree?
A balanced tree is usually the default structure of MySQL due to its unique combination of flexibility, size, and overall good performance. So the worst-case performance of the structure is always quite good, which comes pretty handily for all types of developers.
Unlike binary trees, in which each node has at most two children, balanced trees have many keys per node and this allows them to not grow “tall” or “deep” as quickly as a binary tree. But let's focus again on the index creation.
How to Create an Index?
In MYSQL the command to create an index is simple and self-explanatory.
CREATE INDEX Indexname ON Table ( Column(n) )
Usually, indexes are based on one column, but it is also possible to combine multiple columns.
CREATE INDEX user_firstname_lastname_idx ON user ( firstname, lastname )
An index with multiple columns can already improve your speed and database size drastically. Let us assume we have 100k products in our DB and your query looks like the following.
SELECT * FROM products WHERE is_deleted = 0 AND status = 'ordered';
One solution would be to create two single indexes on the columns
status. This would make your query already faster, but in the long run, it will also increase your index size, because MYSQL has to create and maintain two separate indexes.
Your costs for your databases increases, if the size of the database is getting bigger and bigger. This will probably also affect the performance of the database.
This is why the second and sometimes better solution is to create a combined Index.
CREATE INDEX product_is_deleted_status_idx ON products (is_deleted, status )
Another benefit of the combined index is, that if you only want to query for
is_deletedthe combined index from above still matches. Even if you have three or more columns bound to the index, as long as the order is the right one, you are good to go.
SELECT * FROM products WHERE is_deleted = 0 AND status = 'ordered';Misses Index
SELECT * FROM products WHERE status = 'ordered' AND is_deleted = 0;
But as always it comes with a drawback. As I already mentioned, to use this index the query parameters need to be in the same order as the columns of the index. Writing queries in a bigger team has to be organized and other members of the team need to be aware of the knowledge of indexes as well. Make sure to share new knowledge to make you and your team even better.
A good place for this is a team code review, where each developer can share knowledge and point out bottlenecks that can be improved together.
Analyze Your Application Queries!
Most of the development projects focus on delivering features and afterward analyzing how to get the application faster. Of course, not every performance issue is related to the database, but often it already helps the application to gain more performance if the queries are analyzed.
This is why I came up with the following checklist for me:
- Enable MYSQL slog query logs, to see the queries which are really painful.
- Find reoccurring parameters within your queries belonging to the same table and bring them through a normal or combined index in the right order.
- Rewrite queries to match indexes and change the query of retrieving the data, if possible. (Team discussion)
- Remove unused or redundant indexes to save money and on top of it, the size of the database will be reduced. (see Did You Know)
- Do not add an index everywhere, as this can lead to new performance issues and also increases the size of the database. (see How Indexes effect Writing Operations)
- Share knowledge within your team to keep performance debugging to a minimum. Try to write performance optimized queries right away.
- Keep yourself up to date!
How to Analyze Queries For a Better Index Performance
MYSQL has a really nice function included that helps to get detailed information about your query and of course detailed pieces of information about the indexes that you might hit or miss.
As you probably know, I am talking about EXPLAIN. EXPLAIN also has aliases that do exactly the same. DESCRIBE or DESC.
The function is pretty handy because not only SELECT queries are working with indexes. You can use it to SELECT, DELETE, INSERT, REPLACE, and UPDATE statements as well.
It is also possible to just check the performance by tracking the execution time through variables in MYSQL.
How Indexes affect Writing Operations
As already mentioned, the usage of indexes also has its drawbacks.
An index can affect various types of operations such as ALTER TABLE, single and multiple INSERTS, UPDATE and even DELETE operations. This often happens if you are dealing with a highly frequented database that needs to handle a huge amount of read and write requests at the same time.
But why does it affect the writing operations? Because it has to update the index for the whole table and this is why it can make sense to disable the index generation if you know that there are a lot of write operations incoming during one process.
Another workaround is to disable the autocommit mode of InnoDB because it creates a log flush to disk for every insert.
Besides disabling autocommit, you can also do this for unique_checks and foreign_key_checks. The downside for the last two is the possibility of creating inconsistent data, so be aware of what you are about to do.
Did You Know?
That there are also hidden indexes that every developer unintentionally uses? The
PRIMARY KEY for a column, the
FOREIGN KEY to a related table entry and the
UNIQUE KEY to prevent duplicate values in a table field.
Indexing is one of the essential features of MySQL helping you to optimize query performance, but at the same time, it requires you to be analytical by checking upfront if you already have or even might need to change indexes.
I hope you enjoyed reading the article and could even get some useful information out of it.