What would SQLite look like if written in Rust? — Part 2

Writing a SQLite clone from scratch in Rust

← Part 0 — Overview

← Part 1 — Understanding SQLite and Setting up CLI Application and REPL

Image for post
Image for post

Alright! We are building a clone of SQLite, aka SQLRite, and last time all we did was a simple CLI application that will use for external commands and some accessory functions and also a simple REPL that would take a simple command to exit the application gracefully.

Now we are taking another couple of steps towards our goal. First we want to parse the input to be able to differentiate if the input is a MetaCommand or a SQLCommand . A MetaCommand start with a dot and take direct actions like .open , .help and .exit . And aSQLCommand is, well, you know.

The second step we want to take is to be able parse each of the command types and take the appropriate action. For now, we wont go too far on the database side, but we do want to be able to differentiate between different SQL Statements and have their components broken down into parts and ready to be executed. So next time we can focus on getting the parsed SQL Statement and execute it. Even against a simplified in-memory storage.

Image for post
Image for post
SQLRite Flow Diagram (https://github.com/joaoh82/rust_sqlite)

The frontend of SQLite parses the SQL statements, optimizes them, and as I mentioned on the previous post of the series, generates equivalent SQLite internal representation called bytecode

This bytecode is then passed to the virtual machine, on the backend side, which executes it.

Here is a diagram os the SQLite Architecture to refresh our minds.

Image for post
Image for post
SQLite Architecture (https://www.sqlite.org/arch.html)

Breaking the logic into steps like this has a couple advantages:

  • Reduces the complexity of each part (e.g. virtual machine does not worry about syntax errors).
  • Allows compiling common queries once and caching the bytecode for improved performance.

With this in mind, let’s get started!

First let’s look at some changes in our main.rs . Since this code can get quite extensive I won’t be commenting line by line, but instead focusing on the main points and design choices. For that reason I will try to add as many comments as possible and of course make the code as readable as I can. Nevertheless, please do not hesitate to start a discussion board, create an issue or contact me directly in case you have any questions.

Getting back to the project, as I mentioned above, the first thing I wanted to do is to be able to differentiate between aMetaCommand and aSQLCommand . You can see that I take care of that on line 64 by calling get_command_type(command: &String) that returns an enum of type repl::CommandType with the choices repl::CommanType::SQLCommand(String) and repl::CommanType::MetaCommand(String) . This way I can easily differentiate between the two types of inputs and take the appropriate action on each of them.

Also, as I mentioned on the previous post of the series, I like to keep the main.rs as clean as possible, and just like a point of entry to the application. The main.rs should have the least amount of clutter as possible.

Meta Commands

Next we will move to the meta_command module, which in the repository you will find it in src/meta_command/mod.rs . The idea here is to write a code that is scalable up to a point. I want to easily be able to add more MetaCommands in the future. There are four points on this module.

First the enum type definition, that to improve user experience I added an option Unknown to pick up any MetaCommands not yet defined. After that we have an impl block for the fmt::Display trait, that helps us configure how custom types would be printed out in case we want to use them in a println! macro for example. Then on line 25 you will see another impl block with a fn new method, acting as a constructor for our MetaCommand type. I say acting because Rust is not an Object Oriented language, so the fn new is not like a constructor would be in languages like Java , in fact you could call it anything you want instead of new .

And last but not least we have the pub fn handle_meta_command function, that is responsible for matching the inputed MetaCommand to it’s appropriate command and taking action. You will notice that is returns a Result<String, SQLRiteError> , so we can return a message to the user with ease.

Alright people! We are finally going to do some database stuff! I bet everyone was like “Wasn’t this guy suppose to build a database?”, well yeah, but you gotta build a base first.

Like when you are building a house, laying a nice foundation is one of the most important things you can do for your software.

Database Stuff

Image for post
Image for post

This is our sql module and in the github repository you will find it in src/sql/mod.rs . This actually doesn’t look that different from our meta_command module, at least structure wise. We have an enum , defining the types of queries we plan to support at first. Then a impl block with a fn new method, again acting as a constructor .

And then a fn process_command function returning a Result<String, SQLRiteError>, that if you can remember is invoked from our main.rs . On this function is where the magic starts to happen. You will notice that right at the beginning of the fn process_command function we make use of the sqlparser-rs crate, that did a great job building a Extensible SQL Lexer and Parser for Rust with a number of different SQL dialects, including a SQLite dialect, so for the time being I decided to go with them instead of writing a completely new SQL Lexer . By calling Parser::parse_sql() I am getting it back a Result<Vec<Statement>, ParserError which I do some basic checking and pass it it to a match statement to determine which type of SQL Statement was inputed or if there was an error during the process, if so I just return the error. The Statement returned is a sqlparser::ast::Statement , which is an enum of all the possible statements, as you can see in the link I added from the sqlparser documentation.

For now, the only SQL Statement I managed to actually build the parser was CREATE TABLE , for the rest so far we are only identifying the type of SQL Statement and returning to the user. In the match statement block that matches with CREATE TABLE we call another module parser::create which contains all the logic for the CREATE TABLE . I have this one right after this block.

This is our sql::parser::create module. Here we have two struct types defined. The first one being ParsedColumn , well, representing a column in a table and the second one being CreateQuery , representing a table. As you can see the CreateQuery struct has a property called columns which is a vector of ParsedColumns . And our main method on this module, which is the fn new , returns a Result<CreateTable, SQLRiteError> , which will then be inserted into our database data structure that is still to be defined in the code, although I already have a pretty good idea of what is going to look like in my head and my design notes.

Dealing with errors

You may have noticed that throughout the entire code I am making reference to a SQLRiteError type. That is an error type I defined as an enum using the thiserror crate, that is a super easy to use library that provides a convenient derive macro for the standard library’s std::error::Error trait. If you check the commits in the github repository, you may notice that I first wrote my own implementation of the std::error::Error trait. But then I bumped into this trait, that basically takes care of a lot of the boiler plate, and let’s face it, the code looks super clean! This is our error module so far, located in src/error.rs .

Summary

Alright! This time we managed to parse the user’s commands to differentiate between MetaCommand and SQLCommand . We also implemented a somewhat scalable MetaCommand module that makes it easy to add more commands in the future. We added the sql module, which by making use of the sqlparser-rs crate we are successfully parsing SQL Statements and being able to generate an ast from each SQL Statement. And we are already parsing and generating at least a simplified version of a bytecode from the CREATE TABLE SQL Statement, that is ready to go into the database (which we will do in the next chapter). And to finish it off, we also created an error module, so we have a standardised way of dealing with error throughout the application.

I would say that we are starting out with a nice base. What do you think?

I added a Project Progress and Roadmap sections to the github repository README.md , to add some visibility on where we are and where we are going.

Next time, we will finish parsing the basic SQL Statements we plan to be compatible with and start working on a In-Memory simplified version of out database .

View on Github (pull requests are more then welcome)

If you wanna follow this track don’t forget to follow me here on Medium and also give a couple of claps!

The Polyglot Programmer

Software Engineering, Coding, Algorithms, Tutorials…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store