Building a query language in Ruby on Rails

Stefan Rendevski
Web Factory LLC
Published in
6 min readSep 7, 2020
Abstract Syntax Tree

URL parameters are the easiest way to support filtering in a rails app. If you have a /offersendpoint which returns all offers you might be interested in, then you could filter it by any property by adding query parameters, like for example /offers?category=tech&status=active&relevance=popular and so forth. Parsing these parameters from inside a Rails controller and passing them to ActiveRecord ‘s where method is trivial and works out of the box.

A typical controller would look something like this.

Unfortunately, this only works for exact matches. If i wanted to get all offers created within a certain date or price range, I would be out of luck.

Simple query parameters can only get us so far. They are comprised of 2 parts, a key, indicating the property, and a value. But we need a third part as well, the operator we ought to apply to the property and the value.

This article goes more in-depth about the different syntax that can be used in order to construct wiser query parameters, as well as the pros and cons of each approach. Today, we will take a look at how to implement the translation between our filtering language and the language of our database.

For the purpose of this demonstration, I chose a syntax similar to MongoDB’s query language.

{
field: {
operator:value,
operator:value
},
or: [
{
field: {
operator:value
},
field: {
operator:value
}
}
]
}

There are a couple of things going on here, so let’s break it apart

  • A field can be filtered by multiple operators , all logically connected with AND .
  • Filtering by multiple fields is equivalent to AND -ing them together. You can, however, specify to use OR instead. In the absence of a logical operator, AND is assumed. There can be as many levels of nesting using combinations of AND and OR as desired.
  • Finally, the query in the example would be roughly translated to field operator value AND field operator value AND (field operator value OR field operator value)

Getting this sort of structure from query parameters can be somewhat tricky. Fortunately for us, Rack can parse this out of the box without the need for a custom parameter binder, with the following format: field[operator]=value&field[operator]=value&or[[field[operator]]=value . Admittedly, it does not look pretty, especially with many levels of nesting and and or , but it is effective.

Translation

Assuming we have managed to parse our query string into the desired structure, we now need to parse it and convert it into a query that a database could understand.

We will do this process in two steps.

  • Translate our structure into another, intermediate representation
  • Translate this intermediate representation into a database query

The reason for breaking this into two steps is to give ourselves more flexibility and ease of extension over the types of databases we could support. This intermediate representation should be easier to parse than our initial structure.

The data structure that we will use to form our intermediate representation is called an Abstract Syntax Tree(AST) . It encodes the syntax of our language using nodes for every construct we have in it. Before we proceed any further, let’s define these constructs more formally.

Language constructs

The smallest construct in our filtering language is the field filter:
field: {operator:value, operator:value} .
It is composed of three parts, the name of the field, the operator(s) and the value(s).

Next is the filter group: field: {operator:value}, field: {operator:value} . It is a logical grouping of 1 or several field filters.

And finally, we have an expression. An expression is just several filter groups connected together with a logical operator. To help us in our parser implementation, we can write down the Backus-Naur form of our grammar, explicitly defining the recursive relations between our constructs.

  • <filter>: field: {operator:value}
  • <logical operator>: and | or
  • <filter group>: (<filter> | <expression>)*
  • <expression>: (<logical operator> <filter group>)

Abstract Syntax Tree

Back to our syntax tree. Every elementary construct, one that is not built up using others, will be represented by a node. A class hierarchy could look something like this:

All the different tokens will be represented by their types as well.

Now, having defined our grammar formally, implementing the parser is simply implementing the recursive equations:

Invoking our parser withparser.parse will give us back the root node of our AST. In our case, it will be very similar to a binary tree, where the leaf nodes will be descendants ofFilterNode , and the intermediate nodes will have two children, left and right, as implemented by BinaryNode .

Implementing the lexer object may be the more complicated part, and I may revisit this in a 2nd part, or add a link to a github gist.

Traversing our tree

Now that we have our intermediate representation of the query, we need to translate it into a database-understandable query. To make the code cleaner and easier to extend, we will use the Visitor Pattern. In short, the Visitor pattern solves the problem of extending an existing data structure with new algorithms. In our case, the data structure is the Abstract Syntax Tree, and the different algorithms will be the translations between different database query languages. It does this by separating the implementation in two, the data structure hold all the necessary data, and the visitor object itself implements the required algorithm. It visits each node in the collection, performs a computation, stores the result in its private state, and moves on to the next one. For a detailed description of the visitor pattern and its use in implementing parsers, you can check out this amazing post, which was the inspiration for my own post.

Back to our visitor. For the purposes of simplicity, we will stick with supporting only SQL databases. Fortunately for us, ActiveRecord already uses a library for generating SQL strings, called Arel , which works very similarly to our own parser and AST. Arel builds up an AST when you invoke its predicate methods, and spits out the SQL with the help of the Visitor pattern. The library has visitor objects implemented for every database it supports, and it picks the correct implementation based on your configuration. Instead of reinventing the wheel, we can take advantage of this implementation, and delegate all of the SQL-generation to Arel. Our task then is to map our own AST into one that Arel can understand.

Visitor implementations usually rely on double-dispatch, where each node accepts a visitor object, def accept(visitor) , and then passes it’s self reference to the visitor, def visit(node) . The reason for this double dispatching is because we don’t know either the visitor or the node type until runtime. The visitor would then override visit for every type of node, and rely on method overloading to invoke the correct version. Ruby, however, is a dynamically-typed language, so our approach will be slightly different.

First, we define the Visitor class:

We use ruby’s dynamic method invocation feature, using our node’s class as a descriminator. For example, when visit is invoked with a AndNode , it will delegate to the correct visit_AndNode method.

The collector object will hold the result as we build it up by visiting nodes.

Here we are completely relying on Arel’s methods for implementing our own node types, which means that as long as we don’t have any operators that Arel itself does not support, we will have no problems.

One note of caution when using this approach: we must make sure the values we receive via URL parameters are properly sanitized. I chose to do this inside the visitor , because it gives the most control over the type of sanitization (for example, sanitizing a LIKE query would require using sanitize_sql_like instead of sanitize_sql ).

To wrap things up, using all parts to get to a simple query like in our starting example:

parser = Parser.new(Lexer.new(query_params))
visitor = ArelVisitor.new(Offer.arel_table)
root_node = parser.parse
root_node.accept(visitor)
Offer.where(visitor.to_query)

References:
RESTful filtering: https://www.moesif.com/blog/technical/api-design/REST-API-Design-Filtering-Sorting-and-Pagination/

Explanation of AST/Visitor/Parser: https://ruslanspivak.com/lsbasi-part7/

Using Arel to compose queries: http://radar.oreilly.com/2014/05/more-than-enough-arel.html

--

--