Using Collation in Google BigQuery
How to Compare and Sort Strings easily with SQL
After Google has made headlines with new tools in the area of Data Lakehouse (click here for more info), there are now again novelties for Data Analysts working with BigQuery SQL. Google now offers the functionality of case-insensitive collation [1].
What is Collation?
Collation determines how your data is sorted and compared. It’s for example important when working international texts and chars. Collations provide sorting rules, case, and accent sensitivity properties for your data. Collations that are used with character data types, such as char and varchar, dictate the code page and corresponding characters that can be represented for that data type [2].
Example
Here is a small example of how to use it within BigQuery with distinctive representation of the letter ‘c’. One time lowercase, one time uppercase and one time the Czech letter Č.
SELECT * FROM UNNEST([
COLLATE(‘Č’, ‘und:ci’),
‘c’,
‘C’
]) AS character
ORDER BY character