MySQL diary #2: indexes good practices

Why a good table schema and properly defined indexes are important?

Maciej Brencz
Fandom Engineering
6 min readMar 4, 2019

--

Back in 2016 we faced one of the infamous MySQL replication issue. It was triggered by the maintenance script making one of our database clusters to lag like crazy. All because of one table lacking a primary key.

The above is yet another reason why a good table schema and properly defined indexes are so important (especially on big tables). In this post, we’re going to describe good practices regarding indexes in MySQL and mention things to avoid.

Are you ready? Let’s BEGIN;)

Indexes

Indexes can improve SELECT queries performance, but nothing comes for free. The more indexes table has, the longer it takes to modify the data (UPDATE, INSERT, DELETE queries) and the more space on a disk is used. Those are the biggest and easiest to remember factors, but remember that indexes are being considered by query planner for every type of query, indexes, and operations on them are concurrency/contention/locking problems waiting to happen and, as everything in IT, they have to be used carefully.

Plan your indexes for SELECT queries that you actually make.

Keep in mind that the order of fields is crucial in a case like the following: SELECT foo FROM bar WHERE c1 > 1 AND c2 = 2 will not use the full index defined for (c1, c2) fields - put the column(s) with equality first in your index.

If you make lots of different kind of SELECT queries it’s tempting to create many separate keys for every combination of columns that are used in WHERE part of SELECTs. Multi-column indexes are smaller and faster however, at Wikia’s scale, it’s better to have separate indexes per column to avoid redundant data in indexes — MySQL will use multiple indexes efficient enough.

Don’t optimise for rare queries where performance is not crucial.

Doing a full table scan once in a while can be a totally fine solution, especially if the alternative would cause extra load on every update (classic optimization for ro/rw situations). Such queries should be made in a maintenance script (aka offline) to avoid slowing down end-user response.

TL;DR — define only the indexes you really need, not more

Use EXPLAIN [EXTENDED] to check how MySQL processes your query. Let’s take a look at real-life examples.

Example #1

The output tells us that no index is used for this query — MySQL needs to fetch and examine all the rows in the table. Not the best thing to do, right?

So let’s add an index on the column that we use in WHERE clause. We’re adding dt_query column here as well, as the second kind of queries for this table filters by both columns.

And voilà:

Thanks to the newly created index, MySQL needs to fetch just a single row from the storage engine. MySQL actually optimizes it even better and it will get the result directly from the key (it’s called covering index, but it`s a topic for another post). And last but not least — the query is simply faster.

Example #2

A SQL query issued by one of our legacy components caused serious DB hiccups in the past. As you can see from the EXPLAIN below MySQL had to process over 690k rows before returning the results.

A simple ALTER was performed across wikis:

And the median call time of the PHP function that was performing the query dropped dramatically, from 1102.65 to 32.66 ms (yes, that’s by 97%!).

Keep in mind that we sometimes run dozens of queries per page, for millions of users so it adds up.

Replication

The lack of primary (or unique) key can cause serious replication issues. However, don’t overdo it as primary and multiple unique keys can break replication as well:

INSERT … ON DUPLICATE KEY UPDATE statements on tables with multiple primary or unique keys. When executed against a table that contains more than one primary or unique key, this statement is considered unsafe, being sensitive to the order in which the storage engine checks the keys, which is not deterministic, and on which the choice of rows updated by the MySQL Server depends.

https://dev.mysql.com/doc/refman/5.6/en/replication-rbr-safe-unsafe.html

TL;DR — always define, but only a single, primary (or unique) key per table

Automated table schema audit tool

As the amount of database performance problems grew in time, it became obvious that we need to start acting proactively. Hence we decided to automate the audit of tables schema, with a focus on MySQL indices design. We called the tool index-digest and we described it in details in one of our previous blog posts:

Indexes good practices

Let’s wrap up all that we mentioned above.

  • Plan indexes for you select queries, consider field order
  • Columns needed to consider for selects: where clauses, joins, order bys, group bys
  • Multi-column indexes are smaller and faster but if you have a lot of different queries it will be better to have separate indexes per column to not keep redundant data in indexes, mysql will use multiple indexes efficient enough
  • Updating indexes is expensive, every write/update/delete/alter, keep only as many indexes as needed, not more
  • Try using short columns with good cardinality for indexes (ints, chars, not blobs), you can limit the length of an index when using long columns, a smaller index has more chances of being in cached in memory = faster queries
  • Use EXPLAIN [EXTENDED] to verify the execution plan
  • Don’t add an index when query performance is not critical

Indexes and replication:

  • always have a primary/unique key in InnoDB table
  • Multiple primary/unique indexes can be problematic (INSERT .. ON DUPLICATE UPDATE)

COMMIT /* thank you for your attention :) */;

This post was first published on our internal blog. Written with help from Piotr Drozdowski.

--

--

Maciej Brencz
Fandom Engineering

Poznaniak z dziada-pradziada, pasjonat swojego rodzinnego miasta i Dalekiej Północy / Enjoys investigating how software works under the hood