Orphaned Data: What it is and how to deal with it

Isaac Oseas
4 min readJun 17, 2019

--

Orphaned data sounds strange if you are not familiar with relational databases. So it might be best to start with the relationship between different databases first.

Another way of thinking about this is that you need to build an object that belongs to another object. So, for this example we will have a Chat that has many messages. So, now the Message object has an attribute of chat_id, which works to connect that Message to the Chat. The chats and messages are for that reason stored in separate tables in the database, but the message has a foreign key for the chat that creates the association. Now, this would be a parent table of chats and a child table of messages.

If you assumed that the deletion of a parent creates an orphaned data you are correct.

Now, the problem with orphaned data is that it is at best waisting space in the database and at worse potentially exposing the data needlessly. One example of this, using the chats and messages, would be that chat is deleted and a new chat is created. If the new chat is created immediately after the old chat is deleted it will have the same id as the old chat. Now that means it has a relationship to the messages from the deleted chat, and might appear.

The first issue we can deal with is finding orphaned messages. Depending on your database there are a few different ways to do this. The first is a fairly simple use of #where. Here is the example, Message.where(‘`messages`.`chat_id` NOT IN (?)’, Chat.pluck(:id))

The pluck statement is grabbing or plucking the id column from the chat table, and the where statement is returning all messages with a chat_id that is not included in the Chat Id column. If you want to double check you can set a variable equal to the statement. You will get an array and every element in the array should be a Message that returns nil when you try to find the associated Chat.

An alternative method, which might be more efficient depending on the size of your data, is to use #left_outer_joins to get the data. This would work by running the code : Message.left_outer_joins(:chat).where(chats: {id: nil}). This also has an added advantage of only running a single SQL query in the database, instead of the two required when using #pluck.

Source: w3resource

Both methods are only locating the data, and not doing the more important task of destroying it. While we could run a loop to delete all the orphaned data one by one, the better way is to locate it and delete it using the method #delete_all. You should also be aware that this method is “much more efficient than #destroy_all,” according to the apidock.com, because it deletes the messages with a single request.

Taking care of the orphaned data after the fact in Rails is the hard part. If this is a reoccurring or large scale problem you should research how your database recommends handling orphaned data.

If you have orphaned data it probably means that there was a mistake made either by you or someone on a development team. This is because the orphaned data can be avoided all together in the models section of the Rails application. Simply put the following code as part of the relationship:

The dependency destroy will then run a query to delete every message. This has a pro and a con. On the one hand you are guaranteeing that there is no orphaned messages, but it might be inefficient to run as many queries as you have messages. You might want to handle this issue in the controller by simply running the #where method in the controller with the id of the parent followed by a #delete_all. It would look like this Message.where(‘`messages`.`chat_id` IS (?)’, chat.id).delete_all

Either way the most important aspect is preventing orphaned data that would have to be cleaned out later.

--

--