What if I told you there are no tables in relational databases?

The relational model explained using fashion, two crucial sentences and common sense.

Morpheus in Matrix explaining that there are no spoons — especially table spoons.

I’ve seen one sentence about relational databases repeated on the Web many times. I’ve seen it in many comments, I’ve seen it in few articles. Recently I’ve even seen it in one book — which finally made me write this article. The sentence in question goes like this:

“Ironically, relational databases deal poorly with relationships.”

Read it carefully. Think about it for a moment. I’m sure it must sound perfectly reasonable — for anyone who doesn’t understand the meaning of both “relational” and “irony.”

There are two crucial misunderstandings going on here.


Irony

First — the irony. Personally I blame Alanis Morissette for the almost universal misunderstanding of the word. In her song “Ironic” she presents various stories and asks repeatedly “isn’t it ironic… don’t you think?” — a question to which the answer is always, without exception: “No, it isn’t.” Isn’t this song itself an example of irony in action? Sadly, no.

So what is irony, then? I highly recommend reading the article published in the Guardian in 2003: The final irony or at least the very first sentence of the introductory paragraph of the Irony article in Wikipedia. The concept originated in the ancient Greece and I think that after few thousand years there is really no excuse for ignorance.

Irony has been explained so many times that I will not add any value here. All I can say is that every single time I hear the word “irony” spoken by anyone, I immediately ask myself: is the speaker more like Plato and Aristotle or more like Alanis Morissette? Please keep that in mind.

Plato and Aristotle, a detail of The School of Athens, a fresco by Raphael.

Relations

Now, to a more important topic. Relations.

The word “relational” in a “relational database” has nothing to do with relationships. It’s about relations form relational algebra.

So what is a relation, then? Relation is a set of tuples.

This is crucial so let me repeat it:

Relation is a set of tuples.

Tuples are finite ordered lists of elements. Every tuple in a relation has the same number of elements — it’s an n-tuple in general and specifically a 1‑tuple is called a singleton, a 2‑tuple is called an ordered pair and a 3‑tuple is a triple or triplet — and the elements on the same position are members of the same domains.

A domain is a set of every possible values that a given element of the tuple can have. A domain, for example, can be a set of integers, or a set of two values “yes” and “no.”

So relation is a set of tuples. but it’s not just any set — it is a subset of a Cartesian product of the domains of every element in the tuple.

This is also crucial, so let me repeat:

Relation is a subset of a Cartesian product of the domains.

From those two sentences we can derive almost everything else — which we’ll do in a moment. But first we need to understand few misconceptions.


Misconceptions

Relations are what most of people mistake for “tables.” But there are no tables in the relational model. Relational databases are not spreadsheets, even though people often think about them as such and that leads to countless problems.

At this point you may ask — if there are no tables in a relational database then what do I create with the “create table” command? You create relations. With a “table” keyword. Yes, it’s confusing.

Think of it this way: The “table” keyword in SQL doesn’t create tables but relations, just like the “class” keyword in ECMAScript 6 doesn’t create classes but prototypes. There are no classes in JavaScript even though we have a “class” keyword and similarly there are no tables in relational databases even though we have a “table” keyword.

(As a side note, I really hope that every JavaScript programmer understands that there are no classes in JavaScript or otherwise we’re going to see even more problems than those caused by the false impression that there are tables in relational databases.)

Sometimes the keywords in programming languages can be misleading and we have to deal with it. But even if the names are confusing it doesn’t mean that we should be confused. We need to understand the underlying concepts as an abstract meaning of particular names and symbols that have been chosen (often poorly) to represent some ideas.

That having been said, of course when you get the data out of a relational database it is usually presented as a table, with rows and columns, as in a spreadsheet. Also, the internal representation of data can use certain lookup tables and other table-like structures. But that is irrelevant, that is just an implementation detail. All we care about is that conceptually we are dealing with relations, just like when we are thinking about factorial it’s important to know that we are dealing with a function, no matter how it is implemented or stored in memory — as an array of opcodes, an abstract syntax tree or an s-expression.

What’s the big deal?

So, people say “tables” to mean “relations” — what’s the big deal? They’re used to spreadsheets and they have tables there. Isn’t a database really just a big spreadsheet, anyway?

First of all, what is a table in a spreadsheet? It is a list of rows, each of which is divided into columns.

Table is a list of rows.

You have the first row, the second row… Do you see the problem already?

Relation is a set of tuples.

Maybe it sounds like different names for the same things but the difference is fundamental.

I always say that you should pause after the first word of every definition:

Table is a list…

It’s a list of something, never mind of what right now.

Relation is a set…

It’s a set of something. And a set is not a list. A set doesn’t have an order. And in a set no element can exist more than once. Every element of a set has to be distinct.

And here you go, from the very first word of a definition we already see two important and yet obvious features of relations, that would seem awkward if we thought about tables.

After all, why a table couldn’t have two rows that are the same? It doesn’t make any sense. Also why a table couldn’t have an order? Doesn’t make any sense either. If you think that there are tables in relational databases then those two restrictions seem completely arbitrary.

Every table has an order and saying about the third row always means something. On the other hand saying about the third element of a set is just nonsense without even having to know anything about relational databases. It’s just a set so there is no third element. And no element can be a member of the same set twice. Simple as that.

From that simple fact you can also derive everything you need to know about primary keys, candidate keys, prime attributes and many more properties and restrictions of the relational model that would be impossible to guess simply by thinking about tables.


Sets to the rescue

The most important consequence of the fact that we are dealing with sets is that we have the entire set theory at our disposal. If we have sets, then we can have subsets, unions, intersections and — wait for it — Cartesian products!

Everyone intuitively understands what is an intersection of two sets. It is their common subset — the set of elements that they both include.

A Venn diagram illustrating the intersection of two sets.

Not everyone knows what is a Cartesian product, though.

Cartesian product

What is a Cartesian product of two sets? Let’s say that we have a set of pants and a set of tops in our wardrobe, each containing two elements:

Pants = {jeans, shorts}
Tops = {t-shirt, tank-top}

Those are sets so {jeans, shorts} is the same as {shorts, jeans} and it would make no sense to have {shorts, shorts} with the same shorts present twice — the element “shorts” can either be a member of the set or not, it can’t be a member twice. (Of course we could have few different pieces of shorts, but those would not be the same and we should name them differently, like shorts1 and shorts2, to distinguish them. Here we only have two pieces of pants: one shorts and one jeans.)

Now, a Cartesian product of our pants and tops is simply every possible outfit that we can wear:

Pants × Tops = {
(jeans, t-shirt),
(jeans, tank-top),
(shorts, tank-top),
(shorts, t-shirt)
}

I intentionally ordered it randomly because there is no order in all possible outfits that we can wear (if you take everything out of your wardrobe at once then “order” is the last thing that comes to mind). There is no “third outfit” unless we number it ourselves somehow, but the same jeans with the same t-shirt is the same outfit, however we’d like to number it.

But there is an order inside of each tuple. After all, wearing a t-shirt as pants somehow doesn’t feel right…


So we have a Cartesian product which is a set of all possible outfits. But what if we hate some of those outfits? The combination of our shorts with that certain tank top looks weird so we don’t want to waste our time with that outfit every time we choose what to wear. We need to exclude it somehow. We need a subset that will include only nice outfits. And this is our first relation:

NiceOutfits = {
(jeans, t-shirt),
(jeans, tank-top),
(shorts, t-shirt)
}

It is obvious that the set of nice outfits must be a subset of all possible outfits. Not every possible outfit must be nice, not even every top must be present it the relation — after all we could have a top that looks terrible with every pants — but we can’t have a top in the relation that isn’t member of the Tops set (which is a domain of the second element of our tuples here).


NULL

Now, what about the NULL value that is present in relational databases? Right now we were acting like both pants and tops in our relation were “NOT NULL” but what if, say, tops didn’t have that restriction? Simple. We could be topless. And we would now have 6 possible outfits:

{
(jeans, t-shirt),
(jeans, tank-top),
(shorts, tank-top),
(shorts, t-shirt),
(shorts, NULL),
(jeans, NULL)
}

If both tops and pants could be NULL then we could be fully clothed, topless, bottomless or fully naked, with 9 ways of combining clothes, or lack thereof:

{
(jeans, t-shirt),
(jeans, tank-top),
(shorts, tank-top),
(shorts, t-shirt),
(shorts, NULL),
(jeans, NULL),
(NULL, tank-top),
(NULL, t-shirt),
(NULL, NULL)
}

The NULL which means a lack of value can be though of as an implicit member of every domain, unless you have a NOT NULL restriction. We won’t talk about NULL in the next examples for the sake of simplicity and decency.


Combining relations

What else can we do now? Let’s say we have three sets — Pants, Tops and NiceOutfits:

Pants = {jeans, shorts}
Tops = {t-shirt, tank-top}
NiceOutfits = {
(jeans, t-shirt),
(jeans, tank-top),
(shorts, t-shirt)
}

What can we do to see which nice outfits include pants that don’t look good with every top? Sounds simple, we see that those pants that don’t look good with every top are shorts because there is no shorts with tank-top in NiceOutfits, and we see that those shorts with t-shirt looks good. But how to actually do it? Or, more importantly, how to explain how to do it to a computer in a way that would work for any data?

Thinking imperatively

Most people who think in terms of tables think about it imperatively:

First we need to take a “list” of all pants and remove from it those that are not present in the “table” GoodOutfits alongside with every possible tops from the Tops “table.” We might count occurrences in GoodOutfits separately for every pants and compare it to the total number of tops. If those numbers are the same then we remove that certain kind of pants and if those numbers are different then we keep it. Then, for every element from the list that wasn’t removed we search the NiceOutfits “table” again but this time looking for “rows” that include that certain kind of pants that we deal with in each iteration. And that’s it. It will work and it’s not even that hard to understand.

It will do the trick. But is it the right way to think about that problem? We’re not using the relational model to our advantage.

Thinking declaratively

We need to think declaratively instead of imperatively. We need to explain what we want, not how to get it. But it’s hard to think declaratively if we ignore all of the relational model and set theory because we don’t really have good words to explain the result that we want to get. We don’t have the right language.

So here is another approach. Remember Cartesian products? Let’s make a Cartesian product of NiceOutfits with its complement and see what we get.

An absolute complement of NiceOutfits is just a difference of all possible outfits with the set of nice outfits. We can call it BadOutfits for simplicity:

NiceOutfits = {
(jeans, t-shirt),
(jeans, tank-top),
(shorts, t-shirt)
}
BadOutfits = (Pants × Tops) \ NiceOutfits = {
(shorts, tank-top)
}
NiceOutfits × BadOutfits = {
(jeans, t-shirt, shorts, tank-top),
(jeans, tank-top, shorts, tank-top),
(shorts, t-shirt, shorts, tank-top)
}

We always flatten the tuples like this. One of our sets had 3 elements and the other had only one element so the result will have 3 elements, but with bigger tuples. It may seem like there is some duplication but note that every tuple is still unique.

If we think in a relational way then we may think about elements of the Cartesian product of NiceOutfits and its complement, and take only those elements whose pants coming from the NiceOutfits are the same as the pants coming from its complement. That’s it. We don’t say how to find those elements, what steps need to be taken, we don’t care. That’s the job of the database. We just need to tell it what we want.

We also touched another topic, predicates. We were not talking about the entire Cartesian product of two sets, but about only those elements for which some predicate is true. If we are talking about some elements of a given set, then we are talking about its subset. Understanding the set theory is crucial here, but we also touch the predicate calculus.

Consequences

Some people think that expressing algorithms imperatively or declaratively is just a matter of preference but there are some crucial consequences of choosing the one or the other.

There is a fundamental difference of both of those approaches to finding pants. In the first, imperative example we didn’t tell the computer what we wanted at the end, we just described every step that needs to be taken. The computer couldn’t make our algorithm more efficient automatically, it didn’t have any other choice but to follow our instructions step by step.

On the other hand in the declarative example, we explained what we need and it’s a job of the database authors to implement the algorithms that perform the right operations on sets to give us what we need. One day they could optimize a certain algorithm or write it in a way to work completely differently and as long as the result is still correct with respect to the set theory and relational algebra then the specific steps that the computer takes to finally give us the result is just an implementation detail.

A declarative way of explaining algorithms is like saying “turn right, turn left” to someone driving a car. He can only follow our instructions and can’t choose a better way to get us where we want to be — he doesn’t even know what are our intentions.

If, on the other hand, we say “I need to buy some milk” then the driver can choose a supermarket or a grocery store nearby that is open on that day and choose an optimal way to get us there. In the first example he just couldn’t know from our “turn right, turn left” instructions that it is our poor attempt to find a good way to a store where we can buy some milk.

Crucial knowledge

In my opinions there are three main subjects that need to be known to understand relational databases:

  • set theory
  • predicate calculus
  • relational algebra

Without it you will never truly appreciate the relational model.

Quick Summary


Relations are sets of tuples. Relations are what most people call “tables” and tuples are what they call “rows.” But there are no rows. And there are no tables. And relational databases are not spreadsheets.

Every relational database has a “table” keyword. ECMAScript 6 has a “class” keyword and it doesn’t have classes either. Those are just keywords, often chosen poorly even with good intentions.

What’s the difference between tables and relations? Tables are just lists of rows. And relations are sets of tuples. The difference is the “sets” part. Sets are not ordered. They can’t have the same element more than once. You can think about unions, intersections, Cartesian products and more.

The most important thing to remember is that you use the table keyword in SQL to define relations (not tables) just like you use the class keyword in ES6 to define objects and prototypes (not classes). Fortunately in ES6 you don’t have to use the class keyword — and there are very good reasons to avoid it altogether — but still I expect it to cause at least as much confusion as the table keyword in SQL, if not more.


Conclusion

The choice is yours: you can either pretend that you’re dealing with tables in spreadsheets with weird constraints that don’t make much sense, or you can invest some time and energy to learn the underlying theories and realize that it not only makes prefect sense, but it is in fact quite obvious.

Now every time you hear someone asking: “Isn’t it ironic that relational databases deal poorly with relationships … don’t you think?” I hope you will answer: “Yes, sure. It’s like ray-ee-ain on your wedding day” smiling ironically.


Rafał Pocztarski is a Node.js developer based in Warsaw, Poland, who can be reached as @pocztarski on Twitter or other social networks listed on pocztarski.com. Every feedback is highly appreciated.

See also:

Further reading: