Soft Deletes: Harder Than You Think

Brightloom
Brightloom
Published in
7 min readJan 8, 2019

Contributed by Trisdan Leyson, Platform Engineer @ eatsa

Our restaurant partners work hard to keep customers engaged and happy, and that means they need to be able to update their menus. Perhaps that seasonal offering of liver and onions isn’t selling quite as well as they had hoped. Or maybe a recipe simply needs some adjustments; that vegan chili would be great if it didn’t have those pickles on top. If you write software for restaurants, like we do at eatsa, you need some way to delete items from the menu.

Most of the time, however, you don’t really want to delete the item. You want to make it impossible to order it, but still want to keep a record of its existence. Maybe you want to do some analysis, figure out who ordered it, possibly see what kind of profile they fit, or whatever else you’d like to know. Maybe the purple fingerling potato and truffle oil quinoa salad had a small, devoted following that you’d like to better understand. Maybe you simply want to let customers see what they’ve ordered in the past. There’s no reason to purge an order from history just because it contained an item you don’t serve anymore.

We on the eatsa API team chose to do this by updating our data model in Postgres to support soft deletes. Soft deletes are a fairly simple idea: instead of data actually getting removed from the database, it gets marked as “deleted.” When the mobile app requests the current menu, the code filters out all the soft deleted items so customers can’t order them. However, when customers want to see all the orders they’ve ever made, the code leaves in the soft deleted items. That way, if an eatsa partner lets the CEO’s nephew invent a recipe and then they remove it soon after, customers can still see the time they ordered that brown sugar, yam, marshmallow, and jello salad on a dare.

For such a simple concept, soft deletes have a lot of subtleties, starting with how you mark the data as deleted. The simplest way is to add a true / false flag to each record. If the record is deleted, set the flag to true; otherwise set it to false. This works, but sometimes you realize later that you want multiple types of “deletion.” Perhaps there are some menu items that you want to remove from both the current menu and the order history, but you don’t want to fully delete them. You can’t represent this with just true and false. Instead you could store a character or a short string with each record that represents its current state. With this solution you can invent new codes every time you want a new type of deletion.

Another subtlety is how to track when an item was deleted. One option would be to do this with a separate date field on each record, but you could also use the date to determine if a record is deleted. If the date isn’t set, the record has not been deleted; if it is set, the record has been deleted, and you know when that happened. This is the approach we went with. The part of our API that manages menus is in Ruby, which has a library called Discard for working with this approach to soft deletes.

You could also completely remove the deleted item from the main database table and store it in a separate table. We could have had an “items” table and a “deleted items” table. This approach avoids some of the challenges mentioned below. But it makes the code more complex since you have to search two different tables whenever you need soft deleted items. Searching two tables can also slow down the code when it needs to look up an item that can be either soft deleted or not; now it has to do two lookups from the database instead of one.

Once we’d chosen a “deleted at” date as our implementation of soft deletes, we had to address a few other subtleties. Before, I said “your code filters out all the soft deleted items,” which sounds simple, but can easily get complicated. You have to track down everywhere in your code that looks up data that can be soft deleted and decide whether that context calls for filtering out the soft deleted data or not. If you forget to do this filtering somewhere, it can result in bugs. When we first rolled out soft deletes, we had a bug like this that went uncaught since no one had soft deleted anything from the table that was affected, even during pre-release testing. Luckily, it was caught and fixed the first time someone tried to soft delete an item from that table, which happened in a pre-production environment, so no customers were ever impacted.

A bigger challenge came when we couldn’t support something everyone takes for granted: deleting a record and then creating it again. When you hard delete a record from a database table, it’s completely gone. If you want it back, you can create the same record again. You lose this ability with soft deletes since the record still exists in the table, it just appears deleted from a user’s perspective, so trying to recreate it can run afoul of data integrity checks that the database does. There are two approaches you can take to getting back this behavior, both more complicated from an implementation perspective than what you get with hard deletes. You can allow multiple records with the same data, or you can force there to be only one record.

Either way, you have to define an identity for records so you know when a new record is the same as an existing one. Allowing multiple identical records makes defining an identity trickier. You can’t rely on a generated ID, because each “identical” record will have its own ID. You have to rely on other values to define a record’s identity, which can get very complicated. If you use menu item name as part of the identity, you have to decide if “Burrito Bowl” and “burrito bwol” are the same thing. You have to decide if “No-Worry Curry” is the same thing as “no worry curry” and “no worry, curry.”

Ensuring data consistency can get a lot harder when you allow multiple records with a single identity. You need to write extra code to check validity, and a lot of integrity checks that your database can do become hard or impossible to use. On one project I worked on at a previous company, users were identified by email address. The database forced every record to have a unique email address. When I started on the project, it was impossible to delete users except by going into the database and directly removing the records. Since it was an internal tool used by a small team, this wasn’t a problem for a long time, but eventually an employee with the same name as a previous employee needed to log into the tool. I instituted soft deletes for users, allowing multiple rows with the same identity since that was how it was done in other parts of the code whose example I followed. To make this work I had to remove the uniqueness constraint on the email address. Even though it was valid for multiple records to have the same email address, only one of those records was allowed to be active; the others had to be soft deleted. There was no way to tell the database to enforce this constraint, though, so it had to be checked in code. This made the code more complicated, and removed the last line of defense against inconsistent data being stored in the database.

You also have to be careful about letting your generated IDs get outside your code. Suppose someone uploads a menu with a “Perfect Salad” on it and assigns the “Perfect Salad” an ID. The ID is given to various clients so they can send requests to the eatsa platform to order a “Perfect Salad.” But then it turns out the “Perfect Salad” was a mistake; it was supposed to be a “Pear-fect Salad.” If you’ve implemented soft deletes in a way that allows multiple rows per unique record, deleting the “Perfect Salad” and replacing it with a “Pear-fect Salad” creates a new record that has a new ID, and the one that was given to clients for making orders is now wrong. You need to make sure they all get the new ID for the “Pear-fect Salad” and stop using the one for the “Perfect Salad,” which will fail now because the “Perfect Salad” has been taken off the menu. Even if the process of distributing this ID to the clients is fully automated, there could be a time period where the “Perfect Salad” is deleted but the new IDs haven’t been synched yet due to caching or other factors. During this period, clients who try to order the “Perfect Salad” will see errors.

For all these reasons, we decided to restrict each record to exist only once in a table, and use UUIDs as the identity mechanism. If two records have the same UUID, they’re considered identical. This was straightforward, since our tool for updating menus at eatsa passes a UUID for every item, so the code always knows what record it’s supposed to be looking for. If it doesn’t find the record, it creates one. If it finds the record and that record already exists in the database, but is soft deleted, the soft deleted record is resurrected and modified to whatever state the user requested. If the record exists in the database and is not soft deleted, it’s merely updated to the requested state.

Getting to a good implementation of soft deletes was a lot more challenging than I expected, but in the end we had something that allowed our restaurant partners to update their menus without losing the data that will let them track the sales performance of temporary seasonal items, search for commonalities among their most popular dishes, or just help customers find that amazing burger they ordered six months ago.

--

--