Joining Time-Series Data with the Snowflake ASOF Join
Some time ago, I wrote a post titled “Create missing records with Snowflake SQL”. That was before Snowflake introduced the ASOF join. In this post, I will explore how to work with missing data without creating missing records.
There are many examples where we have to perform a join on time-series data, but some records are missing. For example, you may require exchange rates for each day within a time period, but some days are missing. This could be because the exchange rates weren’t received on a non-working day or there was a technical issue that caused a day to be missing. Another example might be recording measurements every day, such as taking warehouse inventory or a temperature reading, but you may have some missing days because the measurement wasn’t taken or wasn’t recorded.
One way to solve such problems in SQL is to create a table (or view or UDTF or CTE) that contains all dates in the desired range and fill the missing values with the latest known value.
Creating missing values from the latest known value works well when we have discrete time intervals, such as days. But with data that has a lower granularity and a continuous distribution, such as timestamps at the seconds level, creating missing records would require too much granularity and would have too much overhead in terms of the number of missing records that would be generated.
This is where the ASOF join comes to the rescue. We find an explanation of the ASOF join in the Snowflake documentation here. The documentation states:
The assumption is that the time-series data you need to analyze exists in two tables, and there is a timestamp for each row in each table. This timestamp represents the precise “as of” date and time for a recorded event. For each row in the first (or left) table, the join uses a “match condition” with a comparison operator that you specify to find a single row in the second (or right) table.
Basic example
Let’s explore how the ASOF join works with an example. Take an EXCHANGE_RATES table that contains exchange rates for each day, but some days are missing:
Next, let’s take a PAYMENTS table with payments of various amounts on various dates:
Say we want to convert the payment amount into a different currency using the exchange rate from the EXCHANGE_RATES table. If we just join the payment date in the PAYMENTS table with the day in the EXCHANGE_RATES table using a left join, like this:
select P.payment_id, P.payment_date, P.payment_amount,
P.payment_amount * E.rate as payment_amount_in_currency
from PAYMENTS P
left join EXCHANGE_RATES E
on P.payment_date = E.day;
We get values for the payment amounts only for the days when the exchange rates are available:
Let’s see how we can improve the previous query by using an ASOF join that takes the exchange rate value on the last known date before or on the payment date.
As documented in the Snowflake documentation here, the syntax of the ASOF join is:
- select from the left table (in our example, the left table is the PAYMENTS table)
- ASOF JOIN the right table (in our example, the right table is the EXCHANGE_RATES table)
- the order of the tables is important in the condition
- include the MATCH_CONDITION keyword and specify the condition as
(<left.timestamp_col> <comparison operator> <right.timestamp_col>)
- the comparison operator can only be one of the following: greater than, less than, greater than or equal, less than or equal
- the timestamp column can be of a date, time, or timestamp data type
- then add additional join conditions on columns if needed
In our example, we want to convert the payment amount into a different currency by taking the last available exchange rate for the current payment date. To accomplish this, we will ASOF join the PAYMENTS table with the EXCHANGE_RATES table by taking the payment date in the PAYMENTS table that is greater than or equal to the day in the EXCHANGE_RATES table, like this:
select P.payment_id, P.payment_date, P.payment_amount,
P.payment_amount * E.rate as payment_amount_in_currency
from PAYMENTS P
ASOF join EXCHANGE_RATES E
MATCH_CONDITION (P.payment_date >= E.day);
The result of this query is:
As we can see in the results, all payment amounts are converted into a different currency, even on the dates when the exchange rate is not available. Looking closely, although there is no exchange rate on 2023–04–02, the query used the exchange rate from the day before. Similarly, there are no exchange rates on 2023–04–06 and 2023–04–07, but the query used the exchange rate from the last known date which is 2023–04–05.
More complex example
Now let’s look at a more complex example using several currencies.
We have an exchange rates table with multiple currencies where some exchange rates are missing on some dates:
We also have a table with payments on various dates in various currencies:
Similarly as before, we can use an ASOF join that takes the PAYMENTS_CURRENCY table and joins with the EXCHANGE_RATES_CURRENCY table by taking the payment date in the PAYMENTS_CURRENCY table that is greater than or equal to the day in the EXCHANGE_RATES_CURRENCY table, for the corresponding currency. The query is similar as before, except we are adding an ON clause where we specify the join condition on the currency:
select P.payment_id, P.payment_date, P.payment_amount, P.payment_currency,
P.payment_amount * E.rate as payment_amount_in_currency
from PAYMENTS_CURRENCY P
ASOF join EXCHANGE_RATES_CURRENCY E
MATCH_CONDITION (P.payment_date >= E.day)
on P.payment_currency = E.currency
order by P.payment_date, P.payment_currency;
The result of this query is:
As we can see in the results, all payment amounts are converted into the corresponding currency, even on the dates when the exchange rate is not available for the payment currency. In such cases, the last known exchange rate for the given currency is used.
When (not) to use the ASOF join
The ASOF join can be used in many scenarios where you need to analyze time-series data, for example where data changes rapidly:
- Financial trading data
- Weather observations
- Readings from sensors
- Audit trails
The ASOF join can also be useful in scenarios where data doesn’t change rapidly, but can have missing values:
- Exchange rates (as described in this post)
- Stock inventory
A scenario where the ASOF join may not be useful is in business intelligence and reporting solutions where the semantic layer doesn’t (yet) understand the ASOF join. For usage in reporting solutions it might still be beneficial to create tables with missing records so that users can join them using the more common joins. Another option would be to create views in Snowflake that use the ASOF join but present the data to the reporting tools that is already joined to the corresponding tables.
I’m Maja Ferle, Snowflake Data Superhero and a senior consultant at In516ht. You can get in touch with me on LinkedIn.