Arel: how does it work

Gabriela Matos Guedes
Qulture.Rocks Product Blog
11 min readNov 29, 2022

Arel is a great tool for writing queries using Ruby. It already comes together with the ActiveRecord gem and allows us to have a more readable code (when we compare it to pure SQL).

In this article, we will talk about:

  1. Arel’s overall structure
  2. Abstract Syntax Tree — What is it, how does Arel build it and how does it look like
  3. Visitor Pattern — What is it and how does Arel implement it

The version of ActiveRecord used along the article is 6.0.6.

Overall structure

Arel main folder contents

This is Arel’s main folder. It has some files and 4 folders. The main files for us here are the table.rb and the manager files (select_manager.rb, insert_manager.rb, update_manager.rb and delete_manager.rb), which all of them inherits from the tree_manager.rb . We will see more details about them along the text, by now we only need to know that they exist, they are somehow related to trees and have similar names to the SQL commands.

In the attributes folder, we have only a single file, the Attribute class. It represents the columns we get from the table. We have some operations available for it that come from the included modules (for example: as and sum). The files for these modules are the ones in the main folder above.

Attributes class

In the second folder, we have the Collectors. They are the entity responsible for gluing the parts into the SQL query we want. The main files here are the sql_string.rb and the plain_string. The PlainString class only stores the generated string and comes with a method to append it, while the SqlString, which inherits from the PlainString, also has a method to take care of the binds.

Collectors folder
PlainString and SQLString classes

The third folder is the nodes folder and it is very big because, for Arel, everything is a node. It will be more clear why in the AST section.

A piece of the nodes folder

The last one is the visitors folder, which contains the specific info about each database. We will have a section only about it.

Visitors folder

Abstract Syntax Tree

What is it

A tree.

It is basically another way of representing a text, but it has a logical order. Here we have a simple example from Wikipedia, adapted for ruby language:

while b != 0
if a > b
a = a - b
else
b = b - a
end
end
return a

And then, with this code, we have the tree:

An abstract syntax tree

We can start reading it from the first node, the root of the tree, which is a statement sequence. Then we execute first what is on the left of each node, which in this case is the while node. Before we execute the while body, we need to execute its condition, that checks if b != 0. After that, you execute the body, which has a branch due to the if statement and its reading follows the same logic of the while. In the end, it goes to that rightmost node and returns the a variable.

How does Arel build it

Arel uses the nodes from that folder we saw above and that is why everything for Arel is a node. In the Arel AST, the root of the tree will always be a statement (select, update, insert or delete statement). Here we will mainly talk about the SelectStatement, which is instantiated by the SelectManager, one of those main classes from the Arel root folder. Every time an instance from SelectManager is created, its ast attribute is filled with the SelectStatement node:

SelectManager class with its initialize method

But where is the SelectManager instantiated? I mean, when we look at an Arel query, we see that we don’t need to define it.

It is created from the Arel::Table class! In this class we also have lots of methods that represent SQL commands, like join, group or order, and all of them call the from method, which instantiates the SelectManager. Then, we can notice that each of these methods are actually calling the corresponding method from the SelectManager, since they are all doing things like from.group.

Arel::Table methods

One cool thing is that when we go to the SelectManager file, we see that its methods return self, which allows us to concatenate methods like join(table).group(column).

SelectManager methods

Here there is a simple example of how to do the SELECT * FROM table query using Arel and what it returns. For this example, we are using a table called contracts.

Simple Arel query and its returns

We can see that it has already instantiated lots of classes and nodes: the SelectManager, SelectManager, etc, but what has Arel done to build this structure? It has followed these steps:

Steps for building nodes

It starts by initializing the table, then it executes the Arel.star method, which creates the Arel::Nodes::SqlLiteral, then the project method calls the from to instantiate the SelectManager, which starts our AST with the node Nodes::SelectStatement as root. When initializing this root, it initializes two more nodes: the SelectCore and the JoinSources (which has nothing to do with the SQL commandinner join, it is just used to represent from where we are getting the data). Then, with the SelectManager instanciated, we call its project method passing the *.

How does it look like

After all this, we finally have our ast!

Arel AST

We have here all the nodes we saw on those steps and we can see that it gets the * from the contracts table. We can get this tree view by calling the method to_dot on an instance from SelectStatement, write it on a file and then convert it, with the Graphiviz software, to png:

contracts = Arel::Table.new(:contracts)
select_manager = contracts.project(Arel.star)
File.write("ast.dot", select_manager.to_dot)
$ dot ast.dot -T png -o ast.png

This tree can become very complex by only adding more things to our query. For example, when adding a join and two where conditions, like the query below, we can have this big tree:

Arel query
Arel big tree

Visitor Pattern

Let’s talk about how Arel uses the ast: it does so by visiting each node from the tree and translating it to the database language.

What is the Visitor Pattern

A design pattern.

Visitor Pattern diagram

It allows us to extract the elements logic from its implementation. For example, in Arel, we want the details of each database language to be separated from the nodes themselves. This pattern may be a little confusing to understand, but here is an example to help us. In this example, the elements are foods and the visitors are countries. Each country has its price for each food. So, to avoid doing mapping for each country price in the food class, we use the visitor pattern, it looks like this:

class VisitorBrazil
def visitSoyPrice
return 3.50
end
end

class ElementSoy
def acceptSoyPrice(visitor)
visitor.visitSoyPrice()
end
end

brazil = VisitorBrazil.new
soy = ElementSoy.new
soyPriceInBrazil = soy.accpectSoyPrice(brazil)

The Soy element doesn’t need to know the price in each country, it just needs to know how to call the price from the country. Each country will need to implement a visit method for each food. If we wanted to add new foods, we just need to create a class for it and a new method in each visitor:

class VisitorBrazil
def visitSoyPrice
return 3.50
end

def visitCornPrice
return 2.50
end
end

class ElementSoy
def acceptSoyPrice(visitor)
visitor.visitSoyPrice()
end
end

class ElementCorn
def acceptSoyPrice(visitor)
visitor.visitCornPrice()
end
end

How does Arel implement it

Arel has 3 main visitors: DepthFirst, Dot and ToSql, the three inheriting the Visitor class. All the other ones that are on the visitors folder inherit from ToSql and each one represents a database.

Visitors classes

When we have an Arel query, there are 3 main methods that we can call: to_dot, to_sqland each.

The to_dot we have already seen. It is used to construct the AST image we have above. We can call it on any TreeManager instance and it uses the Dot visitor to construct the AST image.

Dot visitor being used

The each is a new one! It is a method that accepts a block as param and is used to iterate over each node from our tree. It uses the DepthFirst visitor and we can call it on any Node instance.

DepthFirst visitor being used

Finally the most important one: to_sql. This method has a very intuitive name, we know that it’ll grab our tree and construct the SQL string for it. It works for TreeManager instances as well as for Nodes instances. It is from it that Arel starts reading our tree to pass the generated string to ActiveRecord execute.

ToSql visitor being used

By the picture above, we see that it is not directly calling the ToSql visitor and, also, we are not telling Arel which database visitor to use! The reason why is that ActiveRecord already knows which database we are using, so it does this for us:

Active record defining the engine

We now know what are the visitors, but what about the elements? And why all the visitors inherit the Visitor class? For Arel, the nodes are the Elements and the visitors are the databases. Arel implemented the visitor pattern a little bit differently, because the accept method, which execute operations on the visitor, isn’t on each node class, it is on the Visitor class!

Visitor class

It is a bit strange, but Arel is doing this because it defined the “elements” dynamically, thanks to ruby features. Continuing on the Visitors class, we see that it defines the method dispatch_cache, which is a hash that defines a default value for a key whose value you want to access.

Arel dispatch_cache
Arel defining hash key-value dynamically

This way, Arel doesn’t need to define an Element class for each node, since we can get the visitor method to be called through this hash. To access this hash, Arel gets the name of the class and each node has its class on the nodes folder we saw above. This way, when we want to read the node Arel::Nodes::Offset, for example, we ask the dispatch the name of the corresponding method for it and the dispatch returns visit_Arel_Nodes_Offset. When we go to the ToSql visitor, we see that it has a method for each single Arel node:

Some methods from ToSql

So, to wrap up, what is the flow when calling to_sql on a tree? This, in order:

We call the accept method of the visitor passing our AST and the collector, which will store the SQL string. The visitor comes from what ActiveRecord has defined, so for example, it can be the Postgres visitor, which inherits the ToSql, which inherits the Visitor class.

The accept method is in the Visitor class. This method calls the visit method, passing the tree and the collector.

The visit method gets the name of the dispatch method it should call by the class of the object. The object at this point is our AST and the class from it is the class of the root of the AST, which is the SelectStatement. Then, the visit_Arel_Nodes_SelectStatement, from the ToSql, will be called and it will start calling the visit method for other nodes that are below in the tree. To be more clear, here we have one of the simplest visit methods for a node, the visit_Arel_Nodes_GreaterThanOrEqual:

This methods gets the node and the first thing to do is visit the left node (remember that we started reading the AST from left to right?), then it stores the result on the collector, append it with operator >= and visit the node from the right. The key point to understand here is that Arel does a kind of recursion, calling the visiting each child node until all nodes have been read.

Now you may be questioning why does each database have its visitor if these methods we saw, for example, are in the ToSql visitor, right? Because there are some specific things for each database that need to be addressed in each corresponding visitor. In postgres, we can have a case sensitive different treatment when matching a string, so the Postgres visitor overwrites this method from the ToSql:

ToSql x PostgreSQL

That’s it!!

--

--