Couple ideas for working with explain plans

--

I can clearly remember, back in the day, being fascinated by databases’ query explain plans.

Fascinated and overwhelmed. To be honest I could not understand much beyond full scan or index scan.

Now a few years later I wouldn’t say I’m 100% able of understanding every single explain plan in the world but I use a couple things that help me.

First one is simple: How would you do it?

The computer is not smarter than you and is not doing magic either. It is just doing whatever the programmer told it to do.

So think yourself in front of a huge book or a huge Excel table and think which steps would you take to retrieve the information you want.

Let’s use a simple query as an example:

SELECT name, surname
FROM people
WHERE age > 38 AND country = 'Spain';

What would you do?

  1. If you have no hint at all?
    Read all the records and filter those where the age and country criteria match, right? Just like the database would!
  2. If you knew the entries are sorted by the age?
    Quickly skim through the records, possibly jumping back and forth until you find where the 38 years old threshold sits and then read all records forward looking for those where country = ‘Spain’, right?
    Good guess!! Just like the database would!

That simple exercise helps myself have a rough idea of what to expect.

Now on to a real explain plan.

Seq Scan on people
Filter: (country = "Spain" && age > 28)

What does this mean?

  1. Seq scan means the database is going to read (scan) the full table sequentially and, for every record it will
  2. Apply a filter consisting of two predicates, combined with AND:
    1. country = “Spain”
    2. age > 38

Makes sense, right? That is actually the exact same thing we said we would do!

Now on to the second idea: Visualize it as a tree

This comes handy specially with more complex plans

The best way to represent a query plan is a tree where the data sources are the leaf nodes and information bubbles up to the root where the final result is produced. Intermediate nodes are usually filters and transformations.

Here we can clearly see how information flows upwards and only the records we want make it to the root (“OUT”) node.

See this other example for a slightly more complicated example involving joining two tables/collections and sorting.

SELECT name, surname, age
FROM people, jobs ON (people.id = jobs.people_id)
WHERE people.age > 38 AND people.country = 'Spain'
ORDER by age;
Sort 
Sort Key: people.age
-> Hash Join
Hash Cond: (people.id = jobs.people_id)
-> Seq Scan on jobs
-> Hash
-> Seq Scan on people
Filter: (country = "Spain" && age > 28)

In this case the database will:

  1. scan and filter records from the people table and build a hash table with them
  2. scan records from the jobs table and join them with the hash table previously built
  3. finally sort the resulting records

Conclusion

If you are into databases you need to understand what the database is doing to resolve your queries. This is hard, specially on your first times.

Thinking like the database and specially visualizing complex plans as a tree definitely helped me, hope it hels you too!

--

--

Carlos Alonso | Software engineering

Software engineering, Systems architecture, Scalability and Databases. Senior software engineer at MongoDB. Adjunct professor at IE School