Paper Review 📄
Access Path Selection in a Relational Database Management System
SQL isn’t a procedural language. Your SQL query doesn’t say “do this, then do that, and that will give me the result I want”. Instead, your query declares what you want from the database, without laying out the exact recipe for fetching it. Of course, this leaves figuring out the recipe to the database software itself.
Access Path Selection in a Relational Database Management System is a 1979 paper that lays out a method of guessing the most performant recipe for fetching the results of a given SQL query.
This paper is the “bible” of query optimization. At 12 pages, it’s succinct, giving a little background at first, but then going into the procedure for estimating the cost of different “access plans” for different types of queries.
How I found it
I found this paper when I was reading a more recent and broader paper on database architecture, which referenced this paper in a way that made it really hard to avoid reading. I’m glad I did.
If you haven’t done SQL querying of a database before, then this paper will mean nothing to you. In particular, you need to have done queries involving joins. You’ll also need some basic knowledge of disk and memory storage.
One good thing to note is that the paper uses the word “relation” for what you’d probably call a “table”. Finally, this paper also mentions indexes a lot, but never defines them. Otherwise you should be good.
First and last cost formulas
The first cost formula listed in the paper is this one:
cost(single relation) = page fetches + W * RSI calls
The last cost formula is:
cost(inner scan of sorted list) = temp pages / N + W * RSI calls
N = product of cardinalities of all the join relations * product of all selectivity factors of all applicable predicates.
What is an RSI call? What is a selectivity factor? What is an “applicable predicate”? I didn’t know either before reading the paper, but I do now. You’ll have to read it yourself to find out!
Most striking thing
What struck me from reading this paper is how much these query optimizers are mini programmers just like you or me, embedded into one layer of a database system. They’re trying to write the most efficient procedure for getting the requested results, but they have to guess and explore and prune their decision trees.
Other names for the paper
This is also known as “the Selinger paper”, for its primary author, Patricia Selinger.