# Scaling Google BigQuery analytic functions

Google BigQuery, Google’s data warehouse solution, has many functions and capabilities.

The analytic functions compute values over a group of rows and return a single result for *each* row. This is different from an aggregate function, which return a single result for *a group* of rows.

These powerful functions on datasets that get bigger and bigger might reach the ‘resources exceed exception’ error, due to the per node memory limit of BigQuery.

We’ll look at two of the most used functions are RANK() and DENSE_RANK() and how to work around with bigger partitions for these functions.

**Partitioning by date**

You can avoid operating on a single node by partitioning the window function by date:

RANK() OVER (PARTITION BY date ORDER BY metric DESC)

If this is not enough for you, you can only use the faster and simpler analytical functions: ROW_NUMBER() and LAG(), and compute the RANK() and DENSE_RANK() manually.

## An example

We’ll use the BigQuery public dataset on wikipedia *`bigquery-public-data.wikipedia.pageviews_2020`* to compute the rank of the most viewed pages per hour and wiki.

To limit the cost our BQ test we can scan less data, see the WHERE condition above. We can drop this filter later.

You can see how RANK(), DENSE_RANK() and ROW_NUMBER() compare on the same window.

For each partition identified by *datehour, wiki* in my example, we can compute the RANK() by using only ROW_NUMBER():

We can compute DENSE_RANK() by using only ROW_NUMBER():

You can verify how these results compare to BQ’s built in functions:

## Does it scale better ?

Let’s try partitioning by *wiki* for the whole year 2020, using the wiki “*en*” corpus with the DENSE_RANK() function:

This will fail, with the “Resources Exceeded during query execution” error.

Using the alternative DENSE_RANK():

We can create tables for the *my_rnk* and *my_drnk* subqueries to make the processing faster.

# Conclusion

BigQuery’s analytic functions are powerful tools that can reduce difficult and expensive-to-compute queries to more simple solutions.

Now, when your data is too big for the analytic functions, you can get the same results in a more manual and less magical way, until BigQuery will handle it all.