Replacing SQLite with Cell, part 2: Complex queries and custom types

Giovanni Z.
The Startup
Published in
8 min readMay 22, 2020

In part 1 we saw how to define a database schema in Cell and we wrote a few basic queries. Those queries were simple enough to be expressed in SQL, and the datasets they returned were a good fit for the tabular data format that is forced on you by SQL. For more complex queries that’s often not the case. As a first example, let’s say that we need to create a report that shows the quarterly sales of all products in the database, organized by customer and year, and we want the products to be grouped by category. We can retrieve the data we need with this SQL query:

whose output looks like this:

A subset of the output of the previous query

The query works, but the tabular format is a terrible way to organize that information. There’s a lot of redundancy in it, and we’ll have to do extra work on the client side just to rearrange it in a more palatable format. For computational purposes, what you probably want to do is to arrange your data hierarchically: a list of categories, containing a list of products, containing a list of customers, containing a list of years, each with the corresponding sales data. It could look more or less like this:

That’s what the following Cell query will output:

The first part of body of quarterly_orders iterates through all items in all orders, and adds them to the subtotals, arranged hierarchically: productcustomeryearquartersubtotal. The second part sorts and fleshes out the results before returning it. This implementation is a bit more complex that it would have been if we had chosen the tabular output format, but this extra effort is more than made up for by what we gain on the C# side. This is the signature of the corresponding method in the generated class:

A better way

Having a method like QuarterlyOrders() is certainly an improvement over having to process the results of a SQL query, but often in Cell we can do a lot better than that. Let’s say the report we need to generate has to be rendered as an HTML page. Think for a second of how cumbersome the whole process is when working with an SQL database: we need to send a query to the database engine, retrieve a result, rearrange the structure of the data into a better format, convert it to native data structures, and finally generate the report.

With Cell on the other hand we’ve a general-purpose language at our disposal, so we can generate the HTML code directly in Cell and return it as a string to the client. With the help of a tiny report generation library that you can find in the github repository, we can create the report like this (the first two blocks of code are the same as before, only the last four lines are new):

Creating the report directly in Cell

Being able to do any sort of computation with no loss of performance on the database side is one of the reasons you can usually avoid having to do any object/relational mapping with Cell. You only have to retrieve and convert the final result instead of all the input data that is required to produce it, which often in practice tends to be a net win.

Now, the idea of doing the computation in the database may sound strange to you. But it’s not some wacky fringe idea: it’s what we all do all the time when we design our code. We try to put together the data and the code that manipulates it. We even have a name for this practice: encapsulation, or information hiding. If you think of a relational automaton in Cell as a large class that contains an entire database it all makes sense.

More queries

We now want to see the list of top-grossing products in each category that make up a given percentage (let’s say it’s 90%) of the total category revenues. For each product we want to see total revenues, revenues as a percentage of the category, and also a cumulative percentage. This is more or less the kind of output we’re looking for:

This is a possible implementation in Cell:

The same query can also be written in SQL, but it’s a bit of a pain in the neck to do so. This is what I came up with:

For the next query, let’s say we’re reviewing discontinued products, to see if it might be worth it to restock any of them again. For each product, we want to see the 3 most recent orders that were placed in the last year of availability, grouped by customer. We want to focus on repeat customers, so we’re going to filter out those that ordered a product only once in the last year. We also want for every order to see the date, the amount and the discount that was applied. This is the sort of output we’re looking for:

Here’s the Cell implementation:

The SQL version is, again, longer and harder to read:

Custom types

By following a few simple rules to map Cell types to equivalent C# one, the compiler can generate a corresponding C# method with a clean and intuitive signature for all the queries we’ve seen so far. We’ll now see what to do when those basic rules fall short, which happens when dealing with polymorphic or recursive types. As an example, let’s say we want to create a method that given a phone number, searches the database for all the people or companies that number might belong to:

Since a phone number can belong to an employee, or a customer, or a supplier or a shipping company the first step in the above snippet of code is to declare a type, ContactId, which is the union of the identifiers of all those entities. If we check the signature of the generated methods we’ll see that ContactId has been mapped to a C# string, which is unexpected:

How does that work? As we briefly mentioned in part 1 when we discussed how data is stored, every value in Cell has a literal, that is, a textual representation that can be used to denote that value in the code and which is also used as the default storage format. So what the compiler does when it encounters a polymorphic type like ContactId is to use, in the absence of other directions from the developer, that textual representation as a data exchange format between your hand-written code and the one it generates.

That’s not particularly elegant nor efficient, but in some cases like this one it works just fine. The strings returned by PhoneOwners(..) can for example be passed back as they are to BasicInfo(..) to retrieve some basic information about the person or company they identify. For more complex cases though you can ask the compiler to generate a corresponding C# class for any number of types defined in your codebase. As an example, let’s say you want to create an organizational chart that includes the name of all employees in the company, with all the sales data:

The first column (indented so as to reflect the reporting structure of the company) shows first and last name of each employee, the second one the total of all sales they’ve made and the last one, when present, shows the sum of their sales and those of all employees that report to them, either directly or indirectly.

The first step is to define the following type, that stores that kind of information for an entire branch of the organizational chart:

Note that SalesTree is a recursive types, which reflects the structure of the organizational chart it’s meant to model. With that type defined, this is how we’ll implement this query:

All we need to do to generate custom types is create a text file containing the names of the types we want to generate (just SalesTree in this case), which we’ll pass to the compiler as a command line argument. This is the definitions of the generated class:

The generated class has the same name and structure of the Cell type it derives from. Note that it’s declared as partial, so you can easily add your own methods and member variables to it in a separate file. The signature of the generated SalesTree(..) and SalesTrees() methods is just what you’d expect:

The compiler can also generate native classes for polymorphic types, which are mapped to class hierarchies in C#. We won’t discuss the details here, but if you’re interested you can check the Interfacing with C# page on the official website.

Next steps

That’s all for part 2. In part 3, which will be available soon, we’ll be discussing updates and persistence models. In the meantime, if you’re interested, you might want to take a look at Cell’s data model and type system. It’s pretty simple stuff, and we will take it for granted here, but since there’s a good chance they’re unlike anything you’re familiar with, it might be a good idea to acquaint yourself with them if you’re going to read the rest of this (long) series of articles.

As already mentioned at the end of part 1, you can find all the code in these posts in a couple of ready-to-build github repositories, for Windows and for Unix-like systems. They’re set up so that you can run all the queries we’ve seen so far, and also easily implement your own.

--

--