Soft deletes are tedious! Does an ideal deletion without loss even exist?

Shubham Sonawane
Geek Culture
Published in
7 min readJun 30, 2021
Photo by Devin Avery on Unsplash

“What…? You want to actually delete the database record!!! Data loss? Why would someone do such a thing?”

This is the type of paranoia that pervades the software industry. It may not matter to an entry-level developer, but senior developers and especially business folks take data loss very seriously. And why shouldn’t they? After all, data, no matter how little, is still data.

Any data that is considered inconsequential now could be a source of possible insights in the near future. As a result, data loss paranoia is understandable, as is the preference for logical deletes over physical ones. Sequilize, a NodeJS ORM, refers to its soft delete feature as “paranoid”, suggesting how real the propaganda is.

But what exactly is logical or soft delete? To put it simply, instead of explicitly deleting a record from the database, a deleted record will be flagged as deleted.

But here’s the thing: no matter how important it is for a business to have a soft delete function, the fact that the data isn’t actually gone and is still part of my table bothers me.

The fact is, the database should always be backed up at some point in time. This way you’ll never lose data due to deletions unless the delete operation is between a backup interval (we will get to it in a while). Furthermore, using soft delete results in an extra operation on the deleted flag field. When you’re writing simpler queries, it’s a waste of resources as it only adds additional performance cost, which is worsened when you’re joining numerous tables. This type of design does not appeal to me at all. I believe in an architecture where only the relevant data is stored in a table.

Embarking on a quest for facts!

Photo by Jezael Melgoza on Unsplash

Let me ask you a question. What are you hoping to achieve by not deleting the table entries? Just the fact that you’ll be able to view those rows in the future, correct? Then why not just make an archive table and move the deleted data there? What’s the issue with that? Is it a matter of convenience? And, if so, do you really value convenience over efficiency? I, for one, would not. But, then again, I suppose my engineering principles forbid me from doing so.

To be honest, I’ve worked on a lot of projects where the primary deletion functionality was soft-delete. To be more honest, I used to believe that soft-deletes were the only way deletion worked in the industry.

But then I discovered something while working on CRUD APIs with Strapi, a popular headless CMS. Thinking it was obvious, I began looking for a way to enable soft-deletes in Strapi’s documentation, only to discover that there is no such feature in Strapi. This meant that users of Strapi, such as Walmart, IBM, and NASA, probably did not use soft-deletes.

For me, this fundamental finding significantly changed the entire deletion paradigm. Of course, this does not imply that I will begin deleting data recklessly at will. But now, more than ever, I will be able to uphold my principles. And thus, I embarked on a quest to find alternatives to both soft and hard deletion techniques. A middle ground in which the table entry is totally erased while no data is lost.

Evaluating soft deletes in-depth

Adopting an alternative over soft delete can be a challenging decision. So, as with every comparison study, let’s begin with a list of pros and cons. As a result, I discovered a list of comparison factors for both soft and hard deletes.

Comparison factors for both soft and hard deletes

Another thing we should do is thoroughly assess the benefits of soft deletes. According to this article by Brentozar on soft-deletes implementation, there are four key reasons to use this design pattern. That is, faster undeletes, deletion history tracking, easier reconciliation during disaster recovery failover, and reduced workload for Availability Group's secondary. I believe the above table summarises the first three reasons. But the last reason, ’reduced workload for Availability Groups secondary’, piques my interest.

What exactly is Availability Groups? According to Microsoft Docs, An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that failover together. When you set the Availability Group to Always On, your data will be considered highly available. Disaster management is stricter than in the average case. This causes a behaviour change when deleting records, particularly on MSSQL servers. They produce ghost records (and version ghost records) as rows from the queue table are consumed and deleted. The ghost cleanup process cannot keep up with the volume of deletes against the table because the database may be used in a readable secondary. This is where soft-deletes come in handy, as you will not be deleting anything. It’s true that soft-deleting a row adds it to the first index, but it also removes it from the second. However, cleanup on these indexes should be much easier because you’re not scattering ghost records across a table; instead, you’re adding those ghost records to a single non-clustered index.

But the fact that you are enabling Availability Groups automatically means that you are paranoid enough about your data. So, of course, you will do everything in your power to ensure that you always have your data. That is perfectly reasonable. But, if disabling Availability Groups will not jeopardise your data preservation, why not do it? It will undoubtedly improve performance. Furthermore, such a scenario should not be the sole reason for choosing a specific approach. Particularly, in our case, soft-deletes. Because Availability Groups may not be supported by other database systems, this scenario is indeed very system-specific.

So… What are the alternatives? Regular Backups, Archives…

Photo by Marten Bjork on Unsplash

Archive Table

Instead of maintaining a flag column in each table, I propose creating a separate table for all of your deleted records. This is achievable by leveraging JSON object support on modern database systems. Another option is to keep two identical tables, one for relevant rows and one for deleted ones, such as table employees and another called deleted_employees. You’ll have to keep both table schemas updated in that situation. This may appear to be a hassle, but both approaches are actually quite simple to manage. CREATE TABLE deleted employees LIKE employees; or something similar can be used to create a new archive table. The first option is even simpler: just create a table with table_name and record column. An example of the same with MySQL8 is shown below.

MySQL 8 example for creating an archive table.

We’ll do two operations here: INSERT INTO another table and DELETE FROM the original table. If you want to undo the deletion, write another INSERT INTO and DELETE in the opposite order. Wrap this code with TRANSACTION if you’re concerned about a failed transaction. Adding a before delete trigger that inserts data into the archive table, like illustrated below, would be a better option.

MySQL 8 example for setting up before delete archive trigger

The benefit of employing a trigger is that cascaded deletes will be archived as well, eliminating the need for manual intervention. Although MySQL users should be aware that there is a bug that prevents the delete trigger from being executed for cascaded deletes. This is an old bug that has been there since 2005…yes, 2005. Come on, Oracle, you’re better than this. Well, no worries, there is a solution that implements archive inserts for all chain of reference tables in the parent table’s trigger.

All of your deleted records will be in one location if you use archives. A simple union query returns all information in a single table, which can subsequently be used for data analysis. Also, data restoration is simple with reverse insert; it’s not as simple as soft delete, but it’s still a good deal in terms of performance.

Regular Backups

Data Backup is the obvious choice. Backing up data is critical regardless of whether you use a soft delete or a hard delete. Backing up your data is like putting it in a safe that you can access quickly and easily in the case of a catastrophe. The confidence that comes with knowing that your most precious data is safe and secure is pretty cool.

In some ways, regular and planned backup solves the problem of data loss. Only data that is inserted and deleted between the two backup intervals causes a problem. This isn’t an issue if your application updates your database at long intervals. However, for an application that is always active, this is a major concern. To avoid any hassles, such applications should always maintain an archive table.

Let’s wrap things …

Deciding on a soft delete design paradigm isn’t something to be taken lightly. It may not appear to be much at first, but as your database grows in size, things become more complicated. Although I believe there are numerous advantages to not using soft delete, many developers prefer it due to their concerns about data loss. The proposed alternative will solve that problem, and taking that leap will only reveal to you how much of a difference it really makes.

--

--

Shubham Sonawane
Geek Culture

Tech Enthusiast ・ Passionate Programmer ・ Professional Engineer ・ Full-time Coder ・ Part-time Hobbyist