Relational Algebra: the Underpinnings of SQL

tiefengeist
5 min readFeb 20, 2019

--

Recently, I’ve been working with SQL, a DSL used for managing structured data, where different elements of said data are related through shared attributes. The way the language is structured seemed similar to the structure of set theory, a topic I’m fairly familiar with. After looking into the matter, I was unsurprised to find SQL is based on an algebra with considerable similarities to set theory, known as relational algebra.

Illustration of Elements of Relational Algebra

It may be best to start with a lay description of an algebra; for our purposes, an algebra may be defined as a number of objects (in our case relations, tuples, and attributes) and a set of operations which alter these objects. Roughly speaking, relations correspond to entire tables, attributes to columns, and tuples to rows.

Set Operations

There are a number of operations in relational algebra. The simplest of these are the set operations of intersection, union and difference, illustrated below.

Source: https://www.leda-tutorial.org/en/official/ch02s08s02.html

The thing is, these operations are pretty limited. That’s because for these operations to be valid, they need to satisfy something called union-compatibility. That’s a fancy way of saying they need to have the same number of attributes (columns), and those columns need to have the same type. The latter condition makes sense, but the former is really restrictive for most SQL tables. But relational algebra is rich enough to circumvent that problem.

Selection and Projection

Beyond the set operations, there are a number of useful operations, only some of which I’ll cover today. The first of these is renaming, which is self-explanatory. We then have the selection and projection operators. These operators, which can be represented by σ and π, respectively, change attributes and tuples within a relation. This all sounds a bit abstract, but don’t fret! You can think about these operators in a pretty straightforward way.

Selection essentially corresponds to a “WHERE” in SQL — that is, it takes a condition and picks out the rows in the table that satisfy it. Projection is a little cruder than that. It takes in a column name, and gets rid of all the columns that don’t have that name.

Here’s a simple example. Consider the following table:

We may operate on this table by selection, projection, or both. For example, the following code:

SELECT name, age FROM person WHERE age > 34;

uses selection to pick out the name and age attributes, and projection to pick out those tuples which have an age attribute over 34. The resulting table is:

which could be represented in relational algebra as

where R1 is the relation “Person”.

More Complicated Operations

Beyond these operations, we have the cross join, equivalent to the Cartesian product, illustrated here:

Source: http://www.sqltutorial.org/sql-cross-join/

The Cartesian product, or cross join, is a familiar operation in set theory. However, having used SQL, we know there are others. However, this intuition is misleading — the operators we’ve described thus far are the primitive operators of relational algebra; its other operators are simply compositions of these five primitive operators.

Two operations in particular make working with databases way simpler. These are known as the natural join and the theta join.

The natural join takes two relations and returns the set of all different combinations of tuples which share an attribute name, without duplicating their shared attribute. In simpler terms, it combines rows from different tables that have one or more matching columns. Natural joins are represented by this neat little bowtie: ⋈. Here’s an example:

Natural Join in Action

It’s worth noting that this is essentially INNER JOIN, but without repeated columns.

The theta join combines two relations which satisfy a boolean-valued condition which you may specify. For example, if you have a list of cars and boats, and their prices, want to buy both a car and a boat, but don’t want to spend more on the boat than the car, you can do the following:

This can be expressed in SQL, using the ON clause, as

SELECT * FROM cars JOIN boats ON (carprice >= boatprice);

What’s interesting about the theta join is that it’s just a composition of two products we’re familiar with — first we take the Cartesian product of the Car and Boat relations, then we use the selection operator, returning those tuples for which the boolean value of the condition returns TRUE. In relational algebra, assuming Car = R1 and Boat = R2, the theta join may be represented as

Just like in SQL, in relational algebra all of these operations are composable, meaning you can nest as many functions as you’d like.

Conclusion

SQL queries often have confusing or opaque syntax, which, though they utilize elements of natural language, ironically obscure their structure. For the more mathematically-inclined among us, understanding the structure of a tool like SQL through an algebra is clarifying and reassuring, as well as furnishing a number of useful theorems.

--

--

tiefengeist

A blog by Sahir Nambiar for the general reader. About math-y programming topics.