Volcanic piglet, or do-it-yourself SQL

Vladimir Kazanov
Bumble Tech
16 min readJan 22, 2020

--

Collecting, storing and transforming data — these are the main tasks of a data engineer. In any one 24-hour period, Bumble Business Intelligence unit receives and processes over 30 billion events sent from clients’ devices or application servers.

Researching and interpreting all this data are not trivial tasks; sometimes it necessitates going beyond the capabilities of existing databases. And, once you have plucked up courage and decided to do something new, the first thing you need to do is to familiarise yourself with existing approaches.

In a nutshell, this article is aimed at curious and determined developers not afraid of custom solutions. In it, you will find a description of a traditional model for executing queries in relational databases based on the example of the demonstration query language, PigletQL.

Contents

Background history

Our engineering team works on backends and APIs which provide opportunities for analysis and for researching data within the company. Our standard tools are a distributed database utilising dozens of servers (Exasol) and a Hadoop cluster consisting of hundreds of machines (Hive and Presto).

A large proportion of queries to these databases is analytical, i.e. relating to anything from hundreds of thousands to billions of entries. They can take minutes, multiple minutes or even hours to execute, depending on the solution being used and the complexity of the query. When the user/analyst is working manually, this lead time is considered acceptable, but it is not suitable for interactive research via a user interface.

Over time, we identified popular analytical queries and queries which were difficult to express in terms of SQL and developed small-scale, specialised databases for these. They contain a subset of data in a suitable format for lightweight compression algorithms (for example, streamvbyte), which allows you to store several days’ worth of data in the memory of a single machine and to execute queries within seconds.

The initial query languages vis-a-vis this data and their interpreters were implemented spontaneously; we were constantly needing to rework them and each time this took up an inordinate amount of time.

These languages proved to be insufficiently flexible, although there weren’t any obvious reasons for their capabilities being limited. So, we turned to the experience of SQL interpreter developers and, thanks to them, we were able to solve some of the problems which arose.

Later on, I will tell you about the most widespread model for executing queries in relational databases, namely Volcano. Appended to this article is the source code of the interpreter for the primitive SQL dialect, PigletQL, so anyone interested can easily familiarise themselves with the details in the repository.

SQL interpreter structure

A large proportion of popular databases provides an interface to data in the form of the declarative query language, SQL. A query in the form of a string is transformed by a syntactic analyser into an internal representation of the query, similar to an abstract syntax tree. Queries that are not complicated can already be executed at this stage, however, in the case of optimising transformations and subsequent execution, this approach isn’t suitable. That’s why most databases introduce additional intermediate representations.

Relational algebra has become the template for intermediate representations. This is a language in which transformations (operators) performed on data are explicitly described: the predicate-based choice of data subset, joining data from different sources, etc. Moreover, relational algebra is algebra in the mathematical sense, i.e. a large number of equivalent transformations have been formalised for it. For this reason, it is useful for optimising transformations to be performed on the query in the form of a tree of relational algebra operators.

Because there are important differences between internal representations in databases and original relational algebra, it would be more accurate to refer to it as logical algebra.

Verification of the validity of a query is usually performed when compiling the initial representation of the query into a logical algebra operator tree, and corresponds to the semantic analysis stage in ordinary compilers. The database catalogue fulfils the role of the symbol table in typical programming language compilers — this is where information on how the data is arranged and its metadata (tables, table columns, indices, permissions, etc.) is stored.

When compared with general-purpose language interpreters, database interpreters display a further difference, namely variations in the volume of data and metadata regarding the data to be subject to queries. Tables (or relations in terms of relational algebra) may contain varying quantities of data, and indexes may be constructed based on particular columns (relation attributes) etc; i.e. depending on the database arrangement and the volume of data in the tables, the algorithms used to execute a query will vary, as will the order in which they are used.

In order to complete this task another intermediate representation is introduced, namely physical algebra. Depending on whether there are indexes on the columns, the volume of data in the tables and the structure of the logical algebra tree, various forms of physical algebra tree are offered — and the optimal one chosen from among them. It is precisely this tree which is displayed in the database as a query plan. This stage approximately corresponds to the register, distribution, planning and choice of instructions stages in ordinary compilers.

The final stage in the interpreter’s work is execution of the physical algebra operator tree.

The Volcano query execution model

Physical algebra operator tree interpreters were almost always used in closed commercial databases, but the academic literature usually refers to the experimental optimiser, Volcano, which was developed in the early ‘90s.

In the Volcano model, each physical algebra tree operator turns into a structure with three functions: open, next and close. Besides functions, the operator contains the working state. The open function initiates the operator’s state, the next function returns either the subsequent tuple or NULL, if there are no tuples left, and the close function finishes off the work of the operator:

Operators may be placed inside one another, in order to form a tree of physical algebra operators. Each operator thus iterates through either the tuples of a physically-existing relation, or of a virtual relation, formed by iterating through the tuples of included operators:

In contemporary high-level language terms the tree of such operators represents a cascade of iterators.

Even industrial query interpreters in relational DBMS use the Volcano model as a point of reference, which is why I used this as the basis for the PigletQL interpreter.

PigletQL

In order to demonstrate the model, I developed an interpreter for the limited query language, PigletQL. It is written in C and supports the creation of SQL-style tables, but is limited to a single data type, namely 32-bit unsigned integers. All the relations are in-memory. The system works in a single thread and has no transaction mechanism.

PigletQL does not have an optimiser and so SELECT queries are compiled directly to the physical algebra operator tree. The remaining queries (CREATE TABLE and INSERT) work directly from primary internal representations.

Example of a user session in PigletQL:

> ./pigletql
> CREATE TABLE tab1 (col1,col2,col3);
> INSERT INTO tab1 VALUES (1,2,3);
> INSERT INTO tab1 VALUES (4,5,6);
> SELECT col1,col2,col3 FROM tab1;
col1 col2 col3
1 2 3
4 5 6
rows: 2
> SELECT col1 FROM tab1 ORDER BY col1 DESC;
col1
4
1
rows: 2

Lexical and syntactic analysers

PigletQL is a very simple language and its implementation did not require the use of third-party tools at the lexical and syntactic analysis stages.

The lexical analyser is written manually. The analyser object (scanner_t) is created based on the query string and it is the former which issues tokens one-by-one:

scanner_t *scanner_create(const char *string);void scanner_destroy(scanner_t *scanner);token_t scanner_next(scanner_t *scanner);

Syntactic analysis is performed using the recursive descent method. First, the parser_t object is created, which, having obtained a lexical analyser (scanner_t), fills out the query_t object with information regarding the query:

query_t *query_create(void);void query_destroy(query_t *query);parser_t *parser_create(void);void parser_destroy(parser_t *parser);bool parser_parse(parser_t *parser, scanner_t *scanner, query_t *query);

The result of parsing to query_t is one of the three types of query supported by PigletQL:

typedef enum query_tag {
QUERY_SELECT,
QUERY_CREATE_TABLE,
QUERY_INSERT,
} query_tag;
/*
* ... query_select_t, query_create_table_t, query_insert_t definitions ...
**/
typedef struct query_t {
query_tag tag;
union {
query_select_t select;
query_create_table_t create_table;
query_insert_t insert;
} as;
} query_t;

The most complicated form of query in PigletQL is SELECT. It corresponds to the query_select_t data structure:

typedef struct query_select_t {
/* Attributes to output */
attr_name_t attr_names[MAX_ATTR_NUM];
uint16_t attr_num;
/* Relations to get tuples from */
rel_name_t rel_names[MAX_REL_NUM];
uint16_t rel_num;
/* Predicates to apply to tuples */
query_predicate_t predicates[MAX_PRED_NUM];
uint16_t pred_num;
/* Pick an attribute to sort by */
bool has_order;
attr_name_t order_by_attr;
sort_order_t order_type;
} query_select_t;

The structure contains a description of the query (array of attributes requested by the user), a list of data/relation sources, an array of predicates used for tuple filtering and information on the attribute used for sorting results.

Semantic analyser

In ordinary SQL the semantic analysis phase includes verification of the existence of the tables listed and the columns in the table and verification of the types in the query expressions. Verification vis-a-vis tables and columns uses a database catalogue where all the information on the data structure is stored.

PigletQL doesn’t have any complex expressions, so query verification boils down to verification of the metadata of the tables and columns based on the catalogue. SELECT queries, for example, are tested using the validate_select function. This is it in abbreviated form:

static bool validate_select(catalogue_t *cat, const query_select_t *query)
{
/* All the relations should exist */
for (size_t rel_i = 0; rel_i < query->rel_num; rel_i++) {
if (catalogue_get_relation(cat, query->rel_names[rel_i]))
continue;
fprintf(stderr, "Error: relation '%s' does not exist\n", query->rel_names[rel_i]);
return false;
}
/* Relation names should be unique */
if (!rel_names_unique(query->rel_names, query->rel_num))
return false;
/* Attribute names should be unique */
if (!attr_names_unique(query->attr_names, query->attr_num))
return false;
/* Attributes should be present in relations listed */
/* ... */
/* ORDER BY attribute should be available in the list of attributes chosen */
/* ... */
/* Predicate attributes should be available in the list of attributes projected */
/* ... */
return true;
}

If the query is valid, then the next step is to compile this representation into an operator tree.

Compiling queries into an operator tree

As a rule, fully-fledged SQL interpreters have two intermediate representations: logical algebra and physical algebra.

A simple PigletQL interpreter executes CREATE TABLE and INSERT queries directly from primary internal representation, i.e. from the query_create_table_t and query_insert_t structures. More complex SELECT queries are further transformed into a physical algebra operator tree which will be executed by the interpreter.

The operator tree is built from the leaves to the root in the following order:

  1. From the right-hand part of the query (“… FROM relation1, relation2, …”) come the names of the target relations, for each of which a scan operator is created.
  2. 2. Scan operators which extract tuples from relations are joined together in a left-side binary tree via the join operator.
  3. 3. Attributes requested by the user (“SELECT attr1, attr2, …”) are selected by the project operator.
  4. 4. If any predicates are specified (“… WHERE a=1 AND b>10 …”), then a select operator is added to the top of the tree.
  5. 5. If an option for sorting the result is specified (“… ORDER BY attr1 DESC”), then the sort operator is added to the top of the tree.

Compiling in PigletQL code:

operator_t *compile_select(catalogue_t *cat, const query_select_t *query)
{
/* Current root operator */
operator_t *root_op = NULL;
/* 1. Scan ops */
/* 2. Join ops*/
{
size_t rel_i = 0;
relation_t *rel = catalogue_get_relation(cat, query->rel_names[rel_i]);
root_op = scan_op_create(rel);
rel_i += 1;
for (; rel_i < query->rel_num; rel_i++) {
rel = catalogue_get_relation(cat, query->rel_names[rel_i]);
operator_t *scan_op = scan_op_create(rel);
root_op = join_op_create(root_op, scan_op);
}
}
/* 3. Project */
root_op = proj_op_create(root_op, query->attr_names, query->attr_num);
/* 4. Select */
if (query->pred_num > 0) {
operator_t *select_op = select_op_create(root_op);
for (size_t pred_i = 0; pred_i < query->pred_num; pred_i++) {
query_predicate_t predicate = query->predicates[pred_i];
/* Add a predicate to the select operator */
/* ... */
}
root_op = select_op;
}
/* 5. Sort */
if (query->has_order)
root_op = sort_op_create(root_op, query->order_by_attr, query->order_type);
return root_op;
}

Once the tree has been formed, optimising transformations are usually performed, but PigletQL moves straight to the stage of executing the operator tree.

Execution of intermediate representation

The Volcano model assumes an interface for working with operators via the three operations that they share, namely open/next/close. In essence, each Volcano operator is an iterator from which tuples are ‘pulled’, one after another, consequently this execution approach is also called a pull model.

Each of these iterators is itself able to call the same functions of the attached iterators, form temporary tables with intermediate results and transform incoming tuples.

Execution of SELECT queries in PigletQL:

bool eval_select(catalogue_t *cat, const query_select_t *query)
{
/* Compile the operator tree: */
operator_t *root_op = compile_select(cat, query);
/* Eval the tree: */
{
root_op->open(root_op->state);
size_t tuples_received = 0;
tuple_t *tuple = NULL;
while((tuple = root_op->next(root_op->state))) {
/* attribute list for the first row only */
if (tuples_received == 0)
dump_tuple_header(tuple);
/* A table of tuples */
dump_tuple(tuple);
tuples_received++;
}
printf("rows: %zu\n", tuples_received);
root_op->close(root_op->state);
}
root_op->destroy(root_op);return true;
}

The query is first compiled by the compile_select function, which returns the root of the operator tree, after which the actual open/next/close functions are called vis-a-vis the root operator. Each next call returns either the following tuple or NULL. In the latter case, this means that all tuples have been extracted and you need to call the close function, which closes the iterator.

The tuples obtained are counted and exported by the table to a standard output thread.

Operators

The most interesting thing about PigletQL is the operator tree. Let’s look at the structure of some of them.

Operators have a shared interface which consists of pointers to the open/next/close functions and of the additional service function, destroy, which frees up resources from the whole operator tree in one go:

typedef void (*op_open)(void *state);
typedef tuple_t *(*op_next)(void *state);
typedef void (*op_close)(void *state);
typedef void (*op_destroy)(operator_t *op);
/* The operator itself is just 4 pointers to related ops and operator state */
struct operator_t {
op_open open;
op_next next;
op_close close;
op_destroy destroy;
void *state;
} ;

Besides functions, the operator may contain an arbitrary internal state (state pointer).

Later on, I will look at the structure of two interesting operators: the most basic scan operator and the sort operator, which creates an intermediate relation.

Scan operator

The operator which represents the starting point for executing any query is scan. It simply iterates through all the tuples of the relation. The internal scan state indicates which relation tuples will be extracted from, the index of the next tuple in the relation and the structure/link to the current tuple, sent by the user:

typedef struct scan_op_state_t {
/* A reference to the relation being scanned */
const relation_t *relation;
/* Next tuple index to retrieve from the relation */
uint32_t next_tuple_i;
/* A structure to be filled with references to tuple data */
tuple_t current_tuple;
} scan_op_state_t;

To create a scan operator state you need a source relation: everything else (pointers to the relevant functions) is unknown:

operator_t *scan_op_create(const relation_t *relation)
{
operator_t *op = calloc(1, sizeof(*op));
assert(op);
*op = (operator_t) {
.open = scan_op_open,
.next = scan_op_next,
.close = scan_op_close,
.destroy = scan_op_destroy,
};
scan_op_state_t *state = calloc(1, sizeof(*state));
assert(state);
*state = (scan_op_state_t) {
.relation = relation,
.next_tuple_i = 0,
.current_tuple.tag = TUPLE_SOURCE,
.current_tuple.as.source.tuple_i = 0,
.current_tuple.as.source.relation = relation,
};
op->state = state;
return op;
}

In the case of scan, the open/close operations send links back to the first element of the relation:

void scan_op_open(void *state)
{
scan_op_state_t *op_state = (typeof(op_state)) state;
op_state->next_tuple_i = 0;
tuple_t *current_tuple = &op_state->current_tuple;
current_tuple->as.source.tuple_i = 0;
}
void scan_op_close(void *state)
{
scan_op_state_t *op_state = (typeof(op_state)) state;
op_state->next_tuple_i = 0;
tuple_t *current_tuple = &op_state->current_tuple;
current_tuple->as.source.tuple_i = 0;
}

The next call returns the following tuple or NULL, if the relation has no more tuples:

tuple_t *scan_op_next(void *state)
{
scan_op_state_t *op_state = (typeof(op_state)) state;
if (op_state->next_tuple_i >= op_state->relation->tuple_num)
return NULL;
tuple_source_t *source_tuple = &op_state->current_tuple.as.source;
source_tuple->tuple_i = op_state->next_tuple_i;
op_state->next_tuple_i++;
return &op_state->current_tuple;
}

Sort operator

The sort operator issues tuples in the order set by the user. For this purpose a temporary relation needs to be created with tuples obtained from attached operators, and sorted.

Internal state of the operator:

typedef struct sort_op_state_t {
operator_t *source;
/* Attribute to sort tuples by */
attr_name_t sort_attr_name;
/* Sort order, descending or ascending */
sort_order_t sort_order;
/* Temporary relation to be used for sorting*/
relation_t *tmp_relation;
/* Relation scan op */
operator_t *tmp_relation_scan_op;
} sort_op_state_t;

Sorting is performed on a temporary relation (tmp_relation) based on the attributes specified in the query (sort_attr_name and sort_order). All this occurs while the open function is being called:

void sort_op_open(void *state)
{
sort_op_state_t *op_state = (typeof(op_state)) state;
operator_t *source = op_state->source;
/* Materialize a table to be sorted */
source->open(source->state);
tuple_t *tuple = NULL;
while((tuple = source->next(source->state))) {
if (!op_state->tmp_relation) {
op_state->tmp_relation = relation_create_for_tuple(tuple);
assert(op_state->tmp_relation);
op_state->tmp_relation_scan_op = scan_op_create(op_state->tmp_relation);
}
relation_append_tuple(op_state->tmp_relation, tuple);
}
source->close(source->state);
/* Sort it */
relation_order_by(op_state->tmp_relation, op_state->sort_attr_name, op_state->sort_order);
/* Open a scan op on it */
op_state->tmp_relation_scan_op->open(op_state->tmp_relation_scan_op->state);
}

Iteration over temporary relation tuples is performed by the temporary operator, tmp_relation_scan_op:

tuple_t *sort_op_next(void *state)
{
sort_op_state_t *op_state = (typeof(op_state)) state;
return op_state->tmp_relation_scan_op->next(op_state->tmp_relation_scan_op->state);;
}

A temporary relation is deallocated in the close function:

void sort_op_close(void *state)
{
sort_op_state_t *op_state = (typeof(op_state)) state;
/* If there was a tmp relation - destroy it */
if (op_state->tmp_relation) {
op_state->tmp_relation_scan_op->close(op_state->tmp_relation_scan_op->state);
scan_op_destroy(op_state->tmp_relation_scan_op);
relation_destroy(op_state->tmp_relation);
op_state->tmp_relation = NULL;
}
}

This clearly shows why operations to perform sorting on columns without indices can take quite a time.

Query tree examples

It is perhaps useful to give several examples of PigletQL queries and the corresponding physical algebra trees.

The simplest example is where all the tuples are selected from the relation:

> ./pigletql
> create table rel1 (a1,a2,a3);
> insert into rel1 values (1,2,3);
> insert into rel1 values (4,5,6);
> select a1 from rel1;
a1
1
4
rows: 2
>

The simplest queries only use the scan which extracts tuples from the relation and the project which identifies the tuples’ sole attribute.

Selection of tuples using a predicate:

> ./pigletql
> create table rel1 (a1,a2,a3);
> insert into rel1 values (1,2,3);
> insert into rel1 values (4,5,6);
> select a1 from rel1 where a1 > 3;
a1
4
rows: 1
>

Predicates are expressed through the select operator:

Selection of tuples with sorting:

> ./pigletql
> create table rel1 (a1,a2,a3);
> insert into rel1 values (1,2,3);
> insert into rel1 values (4,5,6);
> select a1 from rel1 order by a1 desc;
a1
4
1
rows: 2

In the open call the scan sorting operator creates (materialises) a temporary relation, puts all incoming tuples into it and sorts them all. After this, in the next calls it exports tuples from the temporary relation in the order specified by the user:

Joining the tuples of two relations given a predicate:

> ./pigletql
> create table rel1 (a1,a2,a3);
> insert into rel1 values (1,2,3);
> insert into rel1 values (4,5,6);
> create table rel2 (a4,a5,a6);
> insert into rel2 values (7,8,6);
> insert into rel2 values (9,10,6);
> select a1,a2,a3,a4,a5,a6 from rel1, rel2 where a3=a6;
a1 a2 a3 a4 a5 a6
4 5 6 7 8 6
4 5 6 9 10 6
rows: 2

The join operator in PigletQL uses no complicated algorithms, but simply forms a Cartesian product from the tuple sets in the left- and right-hand subtrees. This is very ineffective, but will suffice in the case of a demonstration interpreter:

Conclusions

For future reference, let me add that, if you are making a language interpreter similar to SQL, it is probably worthwhile to simply use one of the many relational databases available. Thousands of man-hours have been invested in contemporary optimisers and query interpreters for popular databases and, at best, it takes years to develop even the simplest general-purpose databases.

The PigletQL demonstration language imitates how SQL interpreters works, but in reality in our work we only use individual elements of Volcano architecture and only for those (rare) types of queries which are difficult to express in the context of the relational model.

Nevertheless, I repeat: even a superficial acquaintance with the architecture of these kinds of interpreters will prove useful in cases where you need to work flexibly with data flows.

Reading

If you are interested in basic questions about developing databases, then you won’t find a better book than “Database system implementation” (Garcia-Molina H., Ullman J. D., Widom J., 2000).

Its only drawback is its theoretical standpoint of view. Personally, I prefer material to be accompanied by specific examples of code or even a demo project. If this is what you are looking for, then consult “Database design and implementation” (Sciore E., 2008), which includes Java source code of a reasonably complete relational database.

Most popular relational databases continue to use variations of the Volcano model. The original publication was written in very accessible language and is easy to find on Google Scholar: “Volcano — an extensible and parallel query evaluation system” (Graefe G., 1994).

And, even though SQL interpreters have changed quite a lot over recent decades, the underlying structure of these systems hasn’t changed for a long time. For an idea of this, I recommend the overview by the same author, entitled, “Query evaluation techniques for large databases” (Graefe G., 1993).

--

--