How to deduplicate rows in a BigQuery table

Samet Karadag
Nov 19, 2020 · 2 min read

Duplicate data sometimes can cause wrong aggregates or results. You probably need to remove those duplicate rows before doing any aggregation, join or calculation. There are various ways to deal with duplicate data and you can find one of these methods to deal with the duplicate keys/columns/rows in this post .

Case-1; Get only one unique row per a key column

Example data:

Here you see there are some duplicates of full rows, you can use below query to deduplicate this table:

create table dataset.table_name_dedup as (select * except(row_num) from (SELECT*,ROW_NUMBER() OVER (PARTITION BYDB_PKORDER BYDB_PK) row_numFROMdataset.table_name) tWHERE row_num=1)

As you may have recognised we are creating a new table rather than overwriting the existing table. If you want to overwrite existing table, you can use the same name with create or replace:

create or replace table dataset.table_name as (select * except(row_num) from (SELECT*,ROW_NUMBER() OVER (PARTITION BYDB_PKORDER BYDB_PK) row_numFROMdataset.table_name) tWHERE row_num=1)

Case-2; Deduplicate with a criteria such as keep newest

If you have some criteria for the removal, such as a date column, you can change the ordering key like:

create or replace table dataset.table_name as (select * except(row_num) from (SELECT*,ROW_NUMBER() OVER (PARTITION BYDB_PKORDER BYcreation_date desc) row_numFROMdataset.table_name) tWHERE row_num=1)

Case 3; Deduplicate based on multiple columns

If you have multiple column keys or remove duplicates based on some columns you can use those columns within PARTITION BY clause:

Example: deduplicate based on Firstname and Lastname

create or replace table dataset.table_name as (select * except(row_num) from (SELECT*,ROW_NUMBER() OVER (PARTITION BYFirstname, LastnameORDER BYcreation_date desc) row_numFROMdataset.table_name) tWHERE row_num=1)

Please let me know your preferred method by commenting in this post.

Google Cloud - Community

Google Cloud community articles and blogs