Rust crate `derive_sql` example

Collate and interrogate Museum of Modern Art data using a Rust CLI

Julien de Charentenay
Rustaceans
5 min readSep 19, 2024

--

I am the author of the crate derive_sql, which is the first (and only) library that I develop and make available. This article shows how to use it, as well as allow me to try it out in a practical context and improve it.

The example is inspired from the Postgres — Rust Cookbook Aggregate data. It describes the development of a simple command line (CLI) tool using derive_sql to collect Museum of Modern Art (MOMA) Collection database — available on github — records in an SQL database and subsequently query these records. The story also touches on unit testing.

The source code is available on github.

The views/opinions expressed in this story are my own. This story relates to my personal experience and choices. The story, code extract(s), and source code are provided in the hope that they will be useful without any warranty.

Image by Pete Linforth from Pixabay

The aim of the story is to develop a CLI tool to retrieve the MOMA collection records, store these records in an SQL database and run two queries against these records: (a) calculate the number of artists of each nationalities and returns the top 25 nationalities, and (b) calculate the number of artworks by artists nationalities and returns the top 25 nationalities.

The MOMA collection is composed of two JSON files: Artists.json a list of ‘all the artists who have worked MOMA’s collection’ and Artworks.json a list of ‘all of the works that have been accessioned into MoMA’s collection’. The list of artworks contains an array potentially storing multiple artists thaat may have contributed to one artworks. The above collection of information is collected in a database composed of 3 tables describing: (a) each artist, (b) each artwork and (c) artist to artwork relation. The following code extract shows the definition of each table.

The struct Artwork and ArtworkAttribution do not allow for deserialisation of the Artworks.json JSON data structure directly. The file Artworks.json is parsed using a dedicated JSON data structure that is then used to create instances of Artwork and ArtworkAttribution structs.

The functionality of collecting and storing artist and artwork information consists of 3 steps: (a) prepare the SQL table, (b) download the information and (c) store the information. Step (a) and (b) order can be reversed. The following code shows how this was done for the artist information alongside the method signature. The struct SqlArtist is generated by the derive macro derive_sql::DeriveSqlStatement applied on the Artist struct.

The method signature allows for the data manipulation and insertion logic to be tested as shown below. The testing uses rusqlite in-memory database functionalities — but the testing could also be done using a mock implementation of the derive_sql traits, or a temporary connection to a MySQL database using mysql. The functions to retrieve artists and artworks are injected so that API calls that would be time consuming and not controlled are avoided.

The above describe the implementation of the retrieval of the MOMA data and storage in a SQL database. The next step is to implement the ability of running a query of the database to retrieve the top 25 nationalities by number of artists, alongside the number of artists for each of these nationality. The SQL query statement is shown in the Rust Cookbook:

SELECT nationality, COUNT(nationality) AS count 
FROM artist GROUP BY nationality ORDER BY count DESC LIMIT 25

The implementation of this query is done in 3 steps. In the first step, we define a struct that the query result will be converted into. The macro DeriveSqlStatement automatically implements trait derive_sql::traits::TryFromRefRow to allow conversion of query results into the result struct:

The second step is to define a struct that will provide the SQL query by implementing the trait derive_sql::traits::SelectStatement. This trait underpin SELECT queries only. The formatting of the statement uses the SQL table name and field names defined in the struct supporting the artist SQL table. This allow for any change to the latter struct to be reflected here — if the table name were changer or result in a compilation error if the field nationality was spelt differently.

The last step is to make the query call using the functionality provided in the SelectV2 trait. This traits has a blanket implementation for any struct that implement the SelectStatement trait and includes variations to undertake query call with filtering, ordering, limit and offset options.

This functionality can be tested similarly to the original test by starting an SQLite in-memory database, populating it with test data, running the query and checking the results as shown below:

The second query follows a very similar process to the exception that the SQL query to be run uses SQL joins to allow the query to span all 3 tables. This results in a code that is nearly identical to the one above — the QueryResult struct could be reused, but is kept separate so that each query is self contained within its own file . The main difference lay in the implementation of the SelectStatement trait. The SQL statement shown below exceeds the needs required for the set goals — listing nationality by number of artworks only require artwork association, but artwork details. The use of the artwork detail is shown so that one could apply a filtering on for example the department field of the Artwork table.

The command line interface parameters are handled using the clap crate (‘builder’ approach) — see the github repository for details.

Bonus Notes:

The aim was to support SQLite (rusqlite crate) and MySQL (mysql crate) in the same compiled executable. The first attempt using a function returning a dynamic trait led to Rust compilation error E0038 — `error[E0038]: the trait `...` cannot be made into an object. This issue was new to me, and allowed me to learn (and probably not fully understand) about ‘object-safe traits’. Initially I started along the road of re-writing the library traits into ‘object-safer’ traits. But I decided to backtrack and approach the problem with using the current library as building blocks. This was done by creating an enum that encapsulates the supported SQL connections and implement the relevant derive_sql traits — see file database.rs.

The MOMA data results in 15,000+ artists and 150,000+ artworks to be inserted in database. Each data was originally inserted in a sequential manner — no batch operation is/was available in derive_sql. Optimising this operation lead to (a) introducing a new trait to insert multiple entries as batch and (b) learning about SQL transactions. This allowed for the data insertion time to decrease from minutes down to tens of seconds.

Another lesson learnt related to crate versioning, the ability of cargo to handle multiple version of the same crate and the side effects of this ability.

The video below shows the CLI in action — note the use of export RUST_LOG=info to display log information. The top country by number of artists and artworks is America.

Rustaceans 🚀

Thank you for being a part of the Rustaceans community! Before you go:

  • Show your appreciation with a clap and follow the publication
  • Discover how you can contribute your own insights to Rustaceans
  • Connect with us: X | Rust Bytes Newsletter

--

--

Julien de Charentenay
Rustaceans

I write about a story a month on rust, JS or CFD. Email masking side project @ https://1-ml.com & personal website @ https://www.charentenay.me/