Loading large size compressed files in BigQuery (BQ)

Sagar Gangurde
Data Engineering
Published in
1 min readMar 9, 2023

BQ load jobs have the size limit of 4GB for a compressed CSV file. If we try to load > 4GB compressed CSV files in BQ, we get the limit exceeded error.

To resolve the issue we need to split the compressed CSV file into smaller chunks that are each less than 4 GB.

Lets assume, we have a 10GB compressed tab separated CSV file called file.csv.gz which is stored in local dir: /mnt/workspace/. We need to first split the file in smaller chunks.

zcat file.csv.gz | split - -l 1000000 --filter='gzip > $FILE.gz' file.

This should split file.csv.gz file and produce smaller files of 1 million records each with following nomenclature:

file.aa.gz
file.ab.gz
.
.
file.az.gz
file.ba.gz
.
.

Once the splitting process is finished, we can get rid off the source file.

rm file.csv.gz

To load the smaller files to BQ, we need to first copy them to Google cloud storage. Let’s do that.

gsutil -m cp -r /mnt/workspace/ gs://<bucket>/<folder>/

Once files are copied, we can use following command to load these files to some table in BQ.

bq load --replace --source_format=CSV --field_delimiter=tab <dataset>.<table_name> gs://<bucket>/<folder>/*

We face 4GB size limit while loading JSON compressed files as well. We can use this same approach to load large size JSON compressed files in BigQuery (BQ).

--

--