SQL-Inspired Querying of Spreadsheets in Google Sheets

Marco Sanchez-Ayala
The Startup
Published in
4 min readMay 3, 2020

The first day of my data science bootcamp last October, the head instructor said, “You will never use Excel again.” In this world though, the vast majority of people still use spreadsheets to keep track of and process data.

I was recently helping a friend with some work-related data processing in Google Sheets. He wanted to aggregate his data to calculate some statistics, but really only knew how to do this by aggregating everything one-by-one. He could do it pretty easily on Google Sheets, but it would take him a long time and would be very prone to errors.

His problem would be so easily solved if he could just use SQL to query the data in such a way:

SELECT
A,
SUM(C)
FROM
A2:D60
GROUP BY
A

It turns out you can!

The Query Function

Perhaps the most powerful Google Sheets function, as it lets you execute SQL-like queries on data as if it were stored in an actual database. This function takes in three parameters.

QUERY(data, query, [headers])
  • data: The cell range to query e.g. A2:D60.
  • query: The query to execute against data. This is written in the Google Visualization API Query Language e.g. "select A, SUM(C) group by A". Note that there is no FROM clause because this has been defined in data.
  • headers: This is an optional argument that denotes the number of header rows at the top of data. If omitted or set to -1, then Sheets infers this number.

For example, to reproduce our hypothetical SQL query above we would type the following into a cell in our Google Sheets spreadsheet.

=QUERY(A2:D60, "select A, SUM(C) group by A")

After pressing enter, we will see the result of our query generated by Google Sheets. This assumes data exists in columns A and C for rows 2–60, and that C is numerical (and thus can be summed).

Google’s Query Language

According to the documentation:

The query language provides the ability to send data manipulation and formatting requests to the data source, and ensure that the returned data structure and contents match the expected structure.

It turns out this language was born from the need to format data in a way appropriate for visualizing data. That’s why it’s called the Google Visualization API Query Language. I’ll refer to it as Google Query Language (GQL) from here on out for simplicity.

GQL is quite similar to SQL, so if you’re familiar with SQL then this should all be straightforward. Below is a very brief overview of how the language works. I recommend checking out the docs or my example at the bottom of this post if you have questions.

Allowed Clauses

(from documentation)

Aggregation Functions

(from documentation)

Aggregation functions can be used in select, order by, label, format clauses. They cannot appear in where, group by, pivot, limit, offset, or options clauses. (GQL Documentation)

The Rest

The above should get you through most simple queries. Check the documentation for things like scalar functions and language elements.

How to Implement

Let’s see this in action! Here’s an excerpt of the data I’m working with.

I have the above categorical and numerical information for a little over 1300 salsa and bachata tracks.

Let’s find the average song duration by artist. My query function would look like this:

=QUERY(A1:E1365, "select B, AVG(D) group by B")

Note: I reference the columns by their column letter rather than by their actual column name. But other than that, the syntax pretty much follows SQL exactly.

You can do any type of queries that’s allowed by the clauses and functions discussed above. You can even now query this new table created from the first query!

Storing a Query in Another Cell

When you have a bunch of queries like this, it can be hard to remember what your query actually looks like. I like to keep track of this by writing out the query in the cell directly above where I want my result to go. Then I reference that cell in my query function like this.

The query is written in cell G2. My query function then references G2 through the query parameter. The result looks a little cluttered but you can style it however you want. Some light formatting results in the following.

GQL is quite flexible and powerful for quick work in a spreadsheet that needs to be sent around a company to non-technical users, for instance. I was really excited to discover this feature in Google Sheets, but have not seen the same functionality from Excel or Numbers. I wonder what other secrets Google has waiting to be uncovered…

--

--