Soft Delete: It’s Good, but …
Besides deadlines, data loss is also a nightmare for some application developers. It could even have a bigger bad impact because it is not just the developer who needs the data. But unfortunately, this is very likely to happen due to an error or failure of an application in processing something, or user negligence.
My team and I, have experienced data loss in the application we made. Looking at the pattern of the table in which data were missing, we know why it happens, that there is a function of this application to erase data from those tables. But it’s just that and we still clueless about who did it and for what reason. Even so, we have taken several preventive and corrective actions suggested by several advisers from the IT directorate who are, of course, more experienced in developing applications.
From their suggestions, I also got to know about a database table design that can reduce our worries about data loss, namely soft deletion. It is an operation in which a flag is used to mark data as unusable, without literally erasing the data itself from the database.
Although this soft delete can eliminate the worry of losing data, it turns out that many developers are also against this concept. Here are some of the challenges that they complained about, and somehow, apparently there are tricks to overcome some of these challenges.
Application Support
Because the soft delete process uses a marker to mark whether the data is used or not, this means we have to add an is_deleted or is_active column or other markers into the table. This means, we also have to make sure all the queries used in the application load the WHERE clause to filter data that has been marked “deleted” from not being selected. This is of course very hard to do, especially if the table is called in almost all queries used in applications that are already running.
However, if it needs to implement soft delete in these conditions, the developer can create a view table to replace the old one, with the same name so that it will not affect the query in the application.
Database Support: Unique Key and Other Constraints
Using soft delete is almost the same as we “trick” the users by making the data appear as if it doesn’t exist, even though it still exists. But, the database can’t. It will throw a problem and cause an error if the table has:
A unique key, for example, email or username in the User table. This case often occurs when an application is made to the public, where people can easily register themselves to get access to the application.
However, if the application model is like some of the applications in Paragon, where the user is already defined and the access rights are directly regulated by the IT team (or if there is a System Administrator to maintain this task), this error will rarely occur. The need to create a user table with soft delete (is_active) is more necessary because some users can be inactivated for a certain period. If there is no soft delete, it will take more effort if the user is also part of several user groups on a task. So, it’s easier to just change state active or deactivate than to remapping that user into some group of users, right?
But if the application model is general and there is a unique key in the table, the developer can still trick it by using two columns as one unique key (compounded keys). This method can be done by adding or replacing is_deleted with deleted_timestamp and making the unique key and deleted_timestamp as compounded keys.
Besides using compounded keys, developers can also put extra effort into checking whether the unique data to be entered already exists or not. If it already exists, the process that occurs is not input but updates the row (and vice versa).
Another solution to this is that the developer can use an algorithm to do a soft delete, then delete the data after a certain amount of time. This solution is like the one used by Twitter, where when we want to deactivate an account, the account is not automatically deleted and we can activate it again by logging in. Only after 30 days, the account will be permanently deleted from the database.
Foreign keys, especially with the provisions of cascading delete.
One of the advantages of using a database is the presence of foreign keys which makes it easier if there are interdependent tables so that data consistency can be maintained. By applying soft delete to a table that has a foreign key, inevitably the other table must also apply soft delete, which means that we need double extra effort on the two / more tables.
Well, those are some challenges that need to be considered when implementing soft deletion in our database table design. Seems quite complicated, right?
Alternative Solution
But, there is also suggested another alternative that seems more ideal to avoid data loss, which is caused by the backend functionality of the application, compared to soft delete; it named soft archives. To implement this, we have to create a table with the following columns:
- (Optional) Primary id key
- original_id keeps the id of the deleted record
- table_name to keep the name of the original table
- payload stores JSON-stringified data of the deleted record
- (Optional) deleted_by to store user who deletes the data
- (Optional) created_at to store deletion timestamp
This method will have more positive effects besides just keeping deleted data, which include:
- Have only one place to archive records from any table, regardless of the deleted record’s table structure (no need to replicate the original table structure).
- No worry about the unique index and foreign key constraint in the original table
- No more WHERE clause in any query to check for the deletion or no need to create a view table!
So, the concept of soft deletion is a good idea and very tempting. But in practice, we need to pay attention to several things such as adding a WHERE clause to each query, and thinking about errors that may occur due to unique keys and other constraints that we deliberately make to maintain data consistency in our database. If it is necessary to keep the deleted data and hard to deal with some of those challenges, it may be better to use the soft archive method instead of soft delete.
References: