Things To Avoid, Episode 1: INSERT IGNORE

Working with a legacy software is a challenge, but also a rewarding experience. You know, I’ve seen thingsOne of them is The Infamous INSERT IGNORE.

Maciej Brencz
Legacy Systems Diary
4 min readDec 16, 2017

--

MySQL documentation for INSERT statement says:

If you use the IGNORE modifier, errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors generate warnings instead.

Forcefully ignoring errors is never a good idea. Your code may be triggered twice for the same action and that can lead to duplicated INSERT queries being sent. Will you notice that when using INSERT IGNORE?

But there’s a way more serious threat waiting out there for brave developers who did not manage to reach the next paragraph from MySQL docs or just ignore what it says:

Data conversions that would trigger errors abort the statement if IGNORE is not specified. With IGNORE, invalid values are adjusted to the closest values and inserted; warnings are produced but the statement does not abort.

Oh, nice. So the database will help me when I’d try to insert DATETIME value into an INT column. Quite handy, isn’t it?

Legacy code strikes back

Relying on undocumented language features is quite risky. And when it’s mixed with INSERT IGNORE things go really bad.

During one of the OS upgrades back in early 2016, Perl was upgraded from 5.14 to 5.18. Unfortunately, hashes handling changed, keys order in a quite specific scenario was no longer maintained and the code broke. You’re right, Perl never promised the order, but our code was relying on some implementation specifics.

An excerpt from Perl changelog for v5.18:

By default, two distinct hash variables with identical keys and values may now provide their contents in a different order where it was previously identical.

Why was it an issue? The code was preparing a list of hashes to be inserted into the database. Then it used the first item to prepare the list of columns names to be passed to INSERT queries that will follow. However, each INSERT query sent was using different hashes to prepare the list of values to insert. This lead to the following:

INSERT IGNORE INTO foo (id, date) VALUES (1, ‘2006–10–11`), (‘2007–12–09’, 2), …

Basically, the order of column names and respective values was not kept. Of course, MySQL normally raises an error when one tries to insert a date into integer column. But you do remember the quote from MySQL documentation, right? These errors are not raised when INSERT IGNORE is used and silent data conversion takes place:

mysql@master.XXX[database]>SHOW WARNINGS;
+---------+------+----------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------------------------------------------+
| Warning | 1265 | Data truncated for column 'last_revision' at row 1 |
| Warning | 1264 | Out of range value for column 'user_is_closed' at row 1 |
| Warning | 1265 | Data truncated for column 'editdate' at row 1 |
| Warning | 1366 | Incorrect integer value: 'poweruser' for column 'user_is_blocked' at row 1 |
+---------+------+----------------------------------------------------------------------------+
4 rows in set (0.00 sec)

Data consistently inconsistent

The script was running fine after the Perl upgrade (well, at least its stderr was silent), but after few days we got a bug report that something is wrong with our internal tools that use Perl-powered tables. Judging by the report something was quite fishy. And our worries were proven by a quick SELECT query:

Data inconsistency, powered by INSERT IGNORE

So what do we have here. Timestamp and integer values converted to string and inserted as a user name, years stored in integer columns (is 2016 edits count or a year of the last edit). Data inconsistency at its best.

You do not need to agree, but for me no data is better then “data” like rows above.

The affected table weights over 75 GiB and stores 460 mm rows. We can regenerate them, but it’s a long process. The script is running as you read this story. Well, it’s running for five weeks already… All thanks to IGNORE in a single query.

Lessons learned

  • INSERT IGNORE is a silent beast just waiting for your mistake to turn the data into a meaningless set of values.
  • Never ignore errors. Make them loud. Fail early.
  • Relying on undocumented language features (Perl and hash keys order is just one of numerous examples) is a risky game.

You’ve been warned. Now check your code or ngrep your traffic for INSERT IGNORE queries.

--

--

Maciej Brencz
Legacy Systems Diary

Poznaniak z dziada-pradziada, pasjonat swojego rodzinnego miasta i Dalekiej Północy / Enjoys investigating how software works under the hood