Mysql: How to delete records from large Tables?

Aryan01
3 min readMar 17, 2023

--

(Introduction to partitioning)

1. What is the problem with large Tables?

Suppose you have a large database table with millions of records that go back several years. After some analysis, you determine that you only require data from the past few months and that the older data is not necessary for your needs. In fact, keeping the old data is causing your query response times to slow down, and the old data is simply taking up space that you don’t need.

What's the first intuitive approach?

DELETE FROM tb_original WHERE date > NOW() - INTERVAL 4 MONTH

Easy right? Try doing it on your staging environment you will realize that it would be the last thing you would like to do.

Deleting data from a large table can be a resource-intensive operation, especially if the table has many indexes, triggers, or foreign key constraints.
When you delete data from a table, the database needs to find the rows to be deleted, mark them as deleted, update indexes and other related data structures, and release the disk space they occupy.
This process can cause disk I/O, CPU usage, and memory usage, which can affect the performance of the database server and potentially cause replication delays for SLAVES.

2. So what is this SLAVE we are talking about?

In database systems, the terms “master” and “slave” are commonly used to refer to a replication topology in which one database server (the “master”) is responsible for managing the primary copy of the data, while one or more other database servers (the “slaves”) replicate that data for backup or read-only access purposes.
The master server is typically the source of all write operations to the database.
The slave servers, on the other hand, do not accept write operations from clients. Instead, they connect to the master server and replicate its data to maintain a copy of the database that is consistent with the master.

3. Disk space relinquished cannot be reclaimed?

When data is deleted from a table, the database does not immediately release the space that the deleted rows occupied on disk. Instead, the space is marked as free and made available for future use by the database. However, this space remains allocated to the database until it is explicitly released.

4. Why this behavior?

The reason for this behavior is that releasing disk space is a resource-intensive operation that involves modifying the database’s file system and potentially moving other data to fill the gaps left by the deleted data. Releasing disk space immediately after a delete operation would cause unnecessary overhead and could degrade the performance of the database.As a result, over time, deleting data from a large table can cause the database’s disk usage to grow, even if the amount of data stored in the table decreases.

5. So this brings us to the original question what is an effective way to delete?

Solution —

A. Create a new table using the structure of tb_original with the name tb_a_new with the primary key id and date. (Why date we will use it in the next article to use it to partition so that we don't have to manually delete old rows again and again).

B. Copy relevant data from tb_original to tb_a_new.
( To ensure data consistency do this operation on an offline slave.)

INSERT INTO tb_a_new (id, c1, c2,date) 
SELECT id, c1, c2, date FROM tb_original
WHERE date > NOW() - INTERVAL 4 MONTH;

3. After the operation is complete, Rename the new table tb_a_new to tb_original and tb_original to tb_old.

4. Put the offline slave back to replication and once the slave lag is over, all the previously sluggish queries were directed to tb_a_new only this time the query time is reduced to a few milliseconds.

5. Drop the tb_old.

In our next article, we will see how we can automate this process, without creating a new table, using Partitioning.

--

--