Four Things Tropical Trees Taught Me About Troubleshooting SQL
April 10, 2016. Somewhere in Ecuador.
When I work, I usually like to have headphones in. There’s something relaxing about the constant background noise. Still, the mosquito buzzing around my ear isn’t really doing it for me. I’d swat it, but my hands are full— The crook of my left arm is wrapped around a thick vine, one tendril of a massive strangler fig drooping down the trunk of its host tree, and my right hand is outstretched towards a pretty white flower. My right leg dangles precariously into space as I give one last giant heave towards the flower and finally snag it. Back on the ground with my field guide, I work through the pages, referring to the flower every now and then with a magnifying glass.
4. Petioles trigonal, thick, 2–3.5 cm long; sepals 4, with conspicuous latex canals, the outer sepals rounded; petals imbricate, coriaceous, obovate; anthers ovoid…… go to 8
I turn to the next page to look for section 8, but there’s a giant spider squashed all over it and the text is unreadable. I climb back up and try to find a fruit.
April 10, 2018. Santa Cruz, California.
It’s a beautiful day outside, and the ray of sun that’s been lurking on the far wall of the office for a few hours now has just begun to creep into the corner of my computer screen. Some ancient part of my lizard brain knows that at this time of year, that means it’s almost 2pm and with a little luck I’ll be at the beach in 10 minutes.
Incoming Chat from: ——
(01:55:38 PM) *** Izzy joined the chat ***
(01:55:41 PM) Izzy: Hey — — , how’s it going?
(01:55:47 PM) — — : Hi Izzy, I am doing great
(01:55:49 PM) — — : How are you?
(01:56:01 PM) Izzy: I’m doing well! Friday afternoon :) What can I help with?
(01:56:18 PM) — — : My SQL isn’t working.
When I first started working in the Department of Customer Love (yep, that’s actually what it’s called!) at Looker, those were the 4 words that scared me the most. SQL was a prerequisite for the job, but the scraps of SQL I’d taught myself for the interviews still felt perpetually rusty. We had documentation written for almost every possible use case of Looker itself, and the product was understandably new to me so I never felt embarrassed asking questions about it. SQL, on the other hand, I was supposed to already know, and so I often felt like I had to figure it out all by myself.
In college, I majored in Environmental Studies and Biology, and was mostly interested in plants. People sometimes ask me what it’s like to have a job that’s not in “my field”, and I like to joke that even more than not being in “my field”, this is the first job I’ve ever had that isn’t literally in a field. Like, an actual meadow. There’ve been a couple exceptions, sure— A part-time internship here, a couple-week-long summer stint at a law firm there— but for the most part, it’s true. Every single one of my real jobs prior to this one has been out of doors, usually in a field of some kind, doing botany research.
But it’s really not so different. Sure, I’ve got a roof over my head (and free snacks!) but I’ve found myself applying the same concepts I learned studying tropical trees in Ecuador to my daily troubleshooting of Looker customers’ problems. Maybe it’s just seeking comfort from the unknown in the known, but the time I find myself pulling most from my botanical past is when I’m troubleshooting SQL.
Whether you’ve also made a career switch from biology to data, have a broken SQL query that’s been sitting on your desktop for weeks, or just think plants are cool, I hope these four tips help spark some troubleshooting inspiration for your next stumper.
1. Key It Out
That little white flower I plucked from the canopy of a tree earlier was for use in a dichotomous key, one of the handiest identification tools in a botanist’s arsenal. Dichotomous, which means divided into two parts, is just a fancy way of saying “it’s either this or that”. Faced with an unknown problem of potentially vast complexity, it can be a giant relief to just be asked “well is it this, or is it that?” Armed with nothing but this binary logic and a single flower, or sometimes even just a chunk of bark, you can start “keying out” a plant. The farther you get, the fewer species you’ll have left as options.
The earlier excerpt about a flower was one choice at a decision point on a dichotomous key for the family Clusiaceae, where the other option was:
4. Petioles marginate, thin, 1–1.5 cm long; sepals 7, without conspicuous latex canals, the outer sepals acute to acuminate apically; petals decussate, thickly carnose, suborbicular; anthers obovoid…… go to 13
The only important thing to notice about that opaquely scientific sentence is that it is distinct in every way from the other option. There is no overlap, no ambiguity— It doesn’t say “petioles are thin”, it says “petioles are thin, 1–1.5cm long”, and the other choice says “petioles are thick, 2–3.5cm long”. I promise all those other made-up sounding words are also opposites. If you ask a customer (or yourself) “Does your query have expensive joins?”, that’s a lot different than saying “Do you have 3 or more joins, and are any of them full outer joins?”. One gives you the ability to check something off a list, and the other just gives you a moldy breadcrumb covered in grains of salt.
This is an especially powerful framework to apply to customer-facing SQL troubleshooting, because it keeps you on track. Halfway through, you may still have absolutely no clue what the end result will be, but you’ll usually know enough to answer the next question — or at least enough to confidently ask it.
You may not always have a ready made key for the problem you’re facing, but coming up with key-like questions in your head as you go is still a useful exercise. Before you ask a question, take a moment to see if you can generate an unambiguous inverse of it and make sure you know what your logical followup questions for each potential answer are. Keys work well because they provide only 2 options, so that instead of having to come up with reactive questions to an unknown curveball answer, you can pre-generate controlled followups for each choice. Even if you’re thinking of good questions, if you’re not asking them in a way that’s guaranteed to give you an answer you’re anticipating, you can lose control of the situation and get put back on the troubleshooting defensive — leaving you either asking clarifying questions to avoid getting confused, or worse, accidentally embracing the confusion and diving headfirst down a rabbit hole.
2. Does Your Query Have Fur?
But you can’t always jump straight into a key. It can just be a bit too overwhelming sometimes. For example, there are around 25,000 unique plant species in Ecuador, and in the specific patch of cloud forest that I studied, there’s about 1,600. One of my tasks was to identify every single species within randomly selected 500m² plots in the middle of the jungle. The human brain isn’t designed to remember more than a handful of things at the same time, and 1,600 is a preposterous amount of objects to load even into long-term memory. You can’t really begin to key something out when there’s that many options— The key would be hundreds of pages long and you’d have no idea where to begin. Luckily, that’s not the lowest resolution grouping we can achieve of these species. To take the biggest possible step back, every single one of those 1,600 plants belongs to the kingdom of Plantae. Case closed!
Well, maybe not quite. But bear with me— If I saw a gopher under a tree in my plot, I’d know it’s not very closely related to that tree and not worth taking note of. That’s common sense, but if you wanted to win an award for being pedantic, you could say “Those are two very different things. One has fur, so it’s in the kingdom Animalia. The other does not have fur, and has leaves, so it is in the kingdom Plantae. I’m studying plants, so the gopher is out of the scope of my study”. That’s not a very useful distinction, practically, but if you could instead say something like “That tree has extremely unique leaf venation, so I know it’s in the family Melastomataceae”, then you’re cooking. In a split second, based on one easy to remember observation, you’ve gone from 25,000 potential species to just the 553 in the Melastomataceae family, a 98% reduction.
Before you even touch the SQL itself, look at the error again from a birds eye view.
Failed to retrieve data — Syntax error: Unexpected keyword FROM at [3:1] is a wholly unrelated problem to
Error: Resources exceeded during query execution: The query could not be executed in the allotted memory. The former? Time to start looking at the actual SQL for a syntax error, probably near a FROM. Start thinking up some dichotomous questions! The latter? Maybe the best place to start is in the admin options of your database— How much RAM are you giving it? Then you can move into the SQL, probably looking at an entirely different set of clauses than the first option like LIMIT, FULL OUTER JOIN, CROSS JOIN— Things that make memory go boom. Do you have a WHERE clause? You might want to add one.
Keys are generally most useful when they start at the family level, so they can maintain a good degree of precision without being too long. To reach that level, you need to ask really simple observational questions to narrow the scope just a little. In this pre-key triage step, you might ask things like: Are there subqueries or derived tables involved? Is the error a syntax error, or something gnarlier? Is there even an error, or are the numbers just wrong?These high level questions that don’t really require you to get your hands dirty will let you take a deep breath, push all the irrelevant options out of your mind, and start barking up the right tree.
3. Potato, Patata
There’s a tree in Ecuador that grows all over the place, and locals mostly call it Guarumo. Or Sacha warumu, in Quichua. Cecropia engleriana in Latin. In English: Snakewood, pop-a-gun, Cecropia, Mexican bean tree, or trumpet tree. Ask a local from the next town over what a Guarumo is and they might point at a Pourouma cecropiifolia instead, known colloquially as Guarumo de pava but part of a completely different genus. There’s actually 61 different species of Cecropia (only some of which play host to hordes of aggressive ants), and depending on who you ask, they could all just be guarumo or they might all have their own names. Between Spanish, Quichua, Waorani, English, and Latin, you could sit around all day talking about different plants only to realize that everyone was thinking of the same one.
If you’re looking at a problem with Snowflake SQL, don’t expect to be able to replicate it accurately in BigQuery. Don’t call a CTE a subquery or a TIMESTAMP a DATETIME. It sounds like common sense, but being precise is harder than you’d think. At Looker, we do chat-based support, and often the real “troubleshooting” lies in figuring out exactly what everyone’s talking about. 30 minutes into a chat where the horizon still looks bleak, you might go “Oh, when you said join 20 minutes ago, you meant a left join?! That’s the problem!” and you wrap it up in 45 seconds.
|| means concatenate, and you can say
SELECT field1 || field2 AS concatenatedfield. In MySQL, though,
OR , as I found out the hard way on a chat with a customer. If I had a.) figured out what dialect the customer was on and made sure I was on the same page or b.) used some higher-level standard language like the word
concatenate, the mistake wouldn’t have happened and I would have saved a bunch of everyone’s time. Basically, I said Guarumo when I should have said Panpa warumu. If I’d learned to just say Cecropia engleriana instead, it wouldn’t have mattered. These days, if I’m 100% sure what dialect of SQL a customer is speaking, I use the “Latin name” and am as precise as possible. If not, I standardize my troubleshooting and my advice to high level concepts and let the customer lead with the common name of what they think they’re dealing with.
4. Someone Always Knows
On the ecological reserve I did my research at, there was a man named Noé who was my secret weapon. When my books and keys failed me, I’d collect a specimen and hope to see him at lunchtime. Without fail, he’d know what it was. Sometimes I just got a one word common name that’d send me down another rabbit hole of searching, but more often than not, he knew the Latin for what I’d brought him. Sometimes, he even had a pretty good guess about which bend in the trail I’d cut the sample from.
Further south, near another reserve called Cerro Candelaria, there’s a guy named Lou who’s discovered about 50 new species of orchid in the last 10 years. Just by looking at the flower of an orchid there, Lou can pretty much tell you whether anyone’s ever seen it before or not.
There’s someone who knows which plants grow between 1800 and 1900m above sea level on one specific mountain in Ecuador, and some of those species grow nowhere else in the world. There’s someone who knows what’s wrong with your damn SQL query. I promise you. Don’t be afraid to ask the experts.
Besides, sometimes a separate pair of eyes is all you really need. At Looker, I spend all day long sitting at the communal ‘kitchen table’ so that when my eyes start to glaze over and my SQL starts looking suspiciously gopher-like, I can just slide my computer over to my neighbor and get a fresh take on it.
So just ask. Seriously. I’m not saying I know how to fix every SQL query that grows between 0 and 100m in Santa Cruz, California, but if you post your darnedest query on the Looker community forums, I promise I’ll try and answer your question. Unless a real expert gets there before me.