What is relational data?

Say that you have data for Songs and Artists. You could represent it like this:

Pretty straightforward. But what if Kanye feels like changing his rap name to ‘Ye’. If he did, we’d have to update our data in THREE different places!

And we have a really small data set. What if we had a huge data set with all of 137 of Kanye’s Songs? Well then we’d have to change the name ONE HUNDRED AND THIRTY SEVEN times!

Wouldn’t it be great if we could change the name once, and have it reflected everywhere?

Well, we could do so if we structure the data like this:

See how the arrows from artist_id point to the artist? Now if we wanted to change ‘Kanye West’ to ‘Ye’, we’d only have to do it in one place, and it’d be updated everywhere.

I’m sure you could imagine that if you’re dealing with a lot of data, structuring it using the latter approach can save you a lot of duplication.

Relational Data

So is the data for our Songs and Artists relational? Well… that question is sort of malformed. Data itself isn’t inherently relational or not relational. Data can be structured in a relational way, or a non-relational way.

Consider the Song and Artist data that we dealt with above. The first way we structured is non-relational. The second way we structured it is relational. So it’s not the data itself that is/isn’t relational, it’s how we structure it.

So what makes a structure relational?

Consider:

In the former structure, the Artist is part of the Song data structure. In the latter structure, the Song data structure has a reference to the Artist. This reference is what makes it relational.

Think about how the artist_id refers/relates to something else.


Consider a second example — user settings. For my laptop I have a bunch of user settings (username, password, default_browser…). There would be multiple instances of UserSettings, but there isn’t another data type, and the instances don’t relate to each other at all. So this data would be represented well in a non-relational structure.

Which to choose?

We’ve talked about how the non-relational structure has a lot of duplication, and how if something changes, we might need to update it in many different places. This is a disadvantage to non-relational structures.

One advantage to non-relational structures is that the queries are easy. To get the Artist of a Song in the non-relational structure, we just have to query for the Song and the Artist will be part of the data structure that is returned. To get the Artist of a Song in the relational structure, it’s a two step process:

  1. Get the Song.
  2. Using the artist_id from the Song, get the Artist.

So there is a trade off between the ease of querying and the time it takes to update things in the case of changes. How to navigate this trade off will completely depend on the specifics of your situation.


It also may be the case that we’re dealing with data that doesn’t relate to each other at all. Like Songs and Candy. In that case, it wouldn’t make sense to use a relational database. From what I understand, there are two main reasons for this:

  1. Non-relational databases are just faster and more efficient with memory than relational databases for structures that don’t relate to each other at all (like Songs and Candy). This has to do with how they’re designed under the hood, and I don’t know enough to further explain that.
  2. The querying languages for these non-relational structures are usually easier to deal with than those of relational structures. For example, querying for a document in Mongo returns a POJO. When querying for some relational structure, you usually have to use SQL which could be cumbersome (opinion), and the response is a new relation/table that you ultimately have to convert into a data structure like a POJO that you could use.