Why using REPLACE INTO might be a bad idea

TTFB over all application servers (dashed line is previous period)

In our office we have a TV with a performance dashboard, so everyone can see possible problems in the system at any time. Amongst other things we show the TTFB on this dashboard. Those two spikes are not what you want to see, when you are just about to leave the office for lunch.

What happened?

We started looking around what’s going on and found out the following in that order:

  • the number of workers on the app-server rose dramatically
  • but the CPU consumption did not
  • the number of connections to the database reached the upper limit
  • but the CPU consumption did not rise (in fact it decreased)
  • and finally we start to see the following error messages in our monitoring tools:
Deadlock found when trying to get lock; try restarting transaction
  • by that time the system was not responding anymore
  • we checked the processlist from the database to see what it is doing
  • and found loads of REPLACE INTO statements and several big DELETE statements (running since minutes) on the same table

What is so special to that table?

It’s a cache lookup table. Everytime a page gets stored in our frontend cache, we do store some meta information about that cache entry in that database table, mostly dependencies to object data. And when that object data changes (for example the article stock), we fetch all cache entries depending on that object, flush them from the frontend cache and remove the information about it from the lookup table via a simple DELETE statement.

New entries in that lookup table are not created via INSERT, but via a REPLACE INTO statement. This is because it could happen, that two requests hit the same page, both are a cache miss and both try to create an entry in that lookup table. Easy solution …

The InnoDB gap lock

While digging deeper into that problem with our deadlocks, i found an article from Jervin Real from Percona about the InnoDB gap lock where he states the following:

For a non-INSERT write operation where the WHERE clause does not match any row, I expected there should’ve been no locks to be held by the transaction, but I was wrong

Until the moment i read that, i expected the same. But when you read that article from Jervin and the aging bug report in the MySQL bugzilla about the gap lock, you will understand that this makes perfect sense and you start to think about a REPLACE INTO statement being the right thing in this situation.

How we solved that problem?

We just replaced that REPLACE INTO statement with a INSERT ON DUPLICATE KEY UPDATE statement, which does exactly this. If the primary key is found, it updates that record, if not, the record gets inserted. This solved the problems with the deadlocks in that case.

You think that REPLACE INTO works exactly that way? Keep on reading.

The lesser known problem with REPLACE INTO

Do you know what REPLACE INTO really does? Let’s see:

CREATE TABLE a (
id INT(1) UNSIGNED NOT NULL PRIMARY KEY,
data VARCHAR(255) NULL
);
CREATE TABLE b (
id INT(1) UNSIGNED NOT NULL PRIMARY KEY,
fk INT(1) UNSIGNED NOT NULL,
data VARCHAR(255) NULL,
FOREIGN KEY (fk)
REFERENCES a(id)
ON DELETE CASCADE
);
INSERT INTO a VALUES (1, ‘test’);
INSERT INTO b VALUES (1, 1, ‘test’);
REPLACE INTO a VALUES (1, ‘data’);

What do you think will the result of a SELECT * FROM b; look like?

Exactly, the table is empty. This is because the REPLACE INTO first deletes the found record (which triggers the delete cascade) and then creates a new one. This might be expected behavior, depending on the situation, but to my understanding this would normally not be the intended behavior and at least this was new to me.