BigQuery User Defined Aggregate Functions (UDAF) to handle large precision
When you migrate data with precision and scale that is too large to store in BigQuery, you should store the number as string or reduce precision. But then how are you going to make calculations and especially aggregations on it?
BigQuery User Define Aggregate Functions (UDAF) let you use Javascript to overcome such issues.
Not enough precision?
First off, BigQuery supports 64 bit Integers, Floats, Numeric with precision of 38 digits and BigNumeric with a precision of 76.76 digits (the 77th digit is partial). It supports quite large numbers.
Recently I had a case where the precision was fine but scale needed to be 41, as you see above BigNumeric supports maximum 38.
Solution for large precisions
- Store it as Float64 or BigNumeric. You will lose precision but the question to end users, do you really need all the precision anyways?
- Store it as String. Now you hold the whole number but can’t operate on it.
Maybe the first option combined with second one is enough for your use case. Lucky you! But what if you really need to operate on full precision?
Rescue: BigQuery UDAFs
BigQuery UDAF allows you to define an aggregate function in SQL or Javascript where you can pass a set of data and it returns a single value. Below you see that I used JS BigInt library to perform a sum operation.
CREATE OR REPLACE AGGREGATE FUNCTION `PROJECT_ID.DATASET.SumBigInt`(x STRING)
RETURNS STRING
LANGUAGE js
AS r'''
export function initialState() {
return {sum: BigInt(0)}
}
export function aggregate(state, x) {
state.sum += BigInt(x);
}
export function merge(state, partialState) {
state.sum += partialState.sum;
}
export function finalize(state) {
return state.sum.toString();
}
''';
WITH
temp_table AS (
SELECT
'XQ945' AS flight,
'-10000000000000000000000000000000000000000' AS value
UNION ALL
SELECT
'XQ945',
'1'
UNION ALL
SELECT
'XQ944',
'10000000000000000000000000000000000000000'
UNION ALL
SELECT
'XQ944',
'1' )
SELECT
flight, `PROJECT_ID.DATASET.SumBigInt`(value) as bigint_value
FROM
`temp_table` group by temp_table.flight;
The result is very large number in string:
You may even add your own or public libraries to extend JS: