Better Error Handling with BigQuery Python client

Mazlum Tosun
Google Cloud - Community
10 min readOct 26, 2023

--

1. Explanation of the use case presented in this article

This article shows how to have better error handling with BigQuery Python client.

By default, when we ingest data to BigQuery without error handling, the errors are not clear and the stacktrace indicates to check the errors collection.

In this case, it’s not easy to understand the errors.

Sometimes we can have tables with complex schema, records and nested fields, and it’s almost impossible to understand the underlying errors natively in this situation.

To remedy this, it is recommended to add an error handling with a try/except block on a certain exception type, because it gives access to the errors collection.

With this error handling and the access to the errors collection, the errors are clear and we can understand them very easily.

Here you can see the diagram of this use case :

I also created a video on this topic in my GCP Youtube channel, please subscribe to the channel to support my work for the Google Cloud community :

English version

French version

Some explanations :

  • We will execute firstly a Python script and ingest data to BigQuery without error handling => the errors are not clear
  • We will then execute another Python script containing the same logic, but with error handling => the errors are clear because we can access to the errors collection

2. Structure of the project

2.1 Python local environment

The Python local environment uses PipEnv as a package manager and to automate the creation of virtual env.

You can check this video from my GCP Youtube channel that shows :

  • How having a Python comfortable local environment with PyEnv, PipEnv, DirEnv and Intellij IDEA and navigate in all the files, classes and methods
  • How to automate the creation of the virtual env for our Python project

2.2 The code logic

The code logic is :

  • Read a NEWLINE_DELIMITED_JSON file with the players stats for the Qatar Fifa World cup
  • Apply some transformations and join the data with another file
  • Write the result to a BigQuery table

We simulate errors in the input file : world_cup_players_stats_domain_with_errors_ndjson.json :

{"teamName": "Argentina", "teamTotalGoals": "7", "nationalTeamKitSponsor": "Adidas", "topScorers": {"players": [{"playerName": "Lionel Messi \ud83d\udc10", "playerDob": "Jun 24, 1987", "position": "FW", "club": "PSG", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "goals": 7}, "bestPassers": {"players": [{"playerName": "Lionel Messi \ud83d\udc10", "playerDob": "Jun 24, 1987", "position": "FW", "club": "PSG", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "goalAssists": 3}, "bestDribblers": {"players": [{"playerName": "Angel Di Maria", "playerDob": "Feb 14, 1988", "position": "FW", "club": "Juventus", "brandSponsorAndUsed": "Adidas", "appearances": 5}], "dribbles": 6.83}, "goalKeeper": {"playerName": "Emiliano Martinez", "appearances": "7", "savePercentage": "46.67%", "cleanSheets": "43%", "club": "Aston Villa"}, "playersMostAppearances": {"players": [{"playerName": "Emiliano Martinez", "playerDob": "Sep 2, 1992", "position": "GK", "club": "Aston Villa", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Nicolas Otamendi", "playerDob": "Feb 12, 1988", "position": "DF", "club": "Benfica", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Nahuel Molina", "playerDob": "Apr 6, 1998", "position": "DF", "club": "Atletico Madrid", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Cristian Romero", "playerDob": "Apr 27, 1998", "position": "DF", "club": "Tottenham", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Rodrigo De Paul", "playerDob": "May 24, 1994", "position": "MF", "club": "Atletico Madrid", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Enzo Fernandez", "playerDob": "Jan 17, 2001", "position": "MF", "club": "Benfica", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Lionel Messi \ud83d\udc10", "playerDob": "Jun 24, 1987", "position": "FW", "club": "PSG", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Julian Alvarez", "playerDob": "Jan 31, 2000", "position": "FW", "club": "Manchester City", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "appearances": 7}, "playersMostDuelsWon": {"players": [{"playerName": "Leandro Paredes", "playerDob": "Jun 29, 1994", "position": "MF", "club": "Juventus", "brandSponsorAndUsed": "Nike", "appearances": 5}], "duels": 9.24}, "playersMostInterception": {"players": [{"playerName": "Exequiel Palacios", "playerDob": "Oct 5, 1998", "position": "MF", "club": "Bayer Leverkusen", "brandSponsorAndUsed": "Adidas", "appearances": 3}], "interceptions": 3.83}, "playersMostSuccessfulTackles": {"players": [{"playerName": "Leandro Paredes", "playerDob": "Jun 29, 1994", "position": "MF", "club": "Juventus", "brandSponsorAndUsed": "Nike", "appearances": 5}], "successfulTackles": 4.02}}
{"teamName": "France", "teamTotalGoals": "-", "nationalTeamKitSponsor": "Nike", "topScorers": {"players": [{"playerName": "Kylian Mbappe", "playerDob": "Dec 20, 1998", "position": "FW", "club": "PSG", "brandSponsorAndUsed": "Nike", "appearances": 7}], "goals": 8}, "bestPassers": {"players": [{"playerName": "Antoine Griezmann", "playerDob": "Mar 21, 1991", "position": "FW", "club": "Atletico Madrid", "brandSponsorAndUsed": "Puma", "appearances": 7}], "goalAssists": 3}, "bestDribblers": {"players": [{"playerName": "Kylian Mbappe", "playerDob": "Dec 20, 1998", "position": "FW", "club": "PSG", "brandSponsorAndUsed": "Nike", "appearances": 7}], "dribbles": 7.24}, "goalKeeper": {"playerName": "Hugo Lloris", "appearances": "6", "savePercentage": "70.83%", "cleanSheets": "17%", "club": "Tottenham"}, "playersMostAppearances": {"players": [{"playerName": "Aurelien Tchouam\u00e9ni", "playerDob": "Jan 27, 2000", "position": "MF", "club": "Real Madrid", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Kylian Mbappe", "playerDob": "Dec 20, 1998", "position": "FW", "club": "PSG", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Antoine Griezmann", "playerDob": "Mar 21, 1991", "position": "FW", "club": "Atletico Madrid", "brandSponsorAndUsed": "Puma", "appearances": 7}, {"playerName": "Ousmane Dembele", "playerDob": "May 15, 1997", "position": "FW", "club": "Barcelona", "brandSponsorAndUsed": "Nike", "appearances": 7}], "appearances": 7}, "playersMostDuelsWon": {"players": [{"playerName": "Marcus Thuram", "playerDob": "Aug 6, 1997", "position": "FW", "club": "Borussia Monchengladbach", "brandSponsorAndUsed": "Nike", "appearances": 5}], "duels": 18.49}, "playersMostInterception": {"players": [{"playerName": "Ibrahima Konate", "playerDob": "May 25, 1999", "position": "DF", "club": "Liverpool", "brandSponsorAndUsed": "Nike ", "appearances": 5}], "interceptions": 2.16}, "playersMostSuccessfulTackles": {"players": [{"playerName": "Eduardo Camavinga", "playerDob": "Nov 10, 2002", "position": "MF", "club": "Real Madrid", "brandSponsorAndUsed": "Nike", "appearances": 2}], "successfulTackles": 9.06}}
{"teamName": "Croatia", "teamTotalGoals": "2", "nationalTeamKitSponsor": "Nike", "topScorers": {"players": [{"playerName": "Andrej Kramaric", "playerDob": "Jun 19, 1991", "position": "FW", "club": "Hoffenheim", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "goals": 2}, "bestPassers": {"players": [{"playerName": "Ivan Perisic", "playerDob": "Feb 2, 1989", "position": "FW", "club": "Tottenham", "brandSponsorAndUsed": "Nike", "appearances": 7}], "goalAssists": 3}, "bestDribblers": {"players": [{"playerName": "Mateo Kovacic", "playerDob": "May 6, 1994", "position": "MF", "club": "Chelsea", "brandSponsorAndUsed": "Nike", "appearances": 7}], "dribbles": 2.39}, "goalKeeper": {"playerName": "Dominik Livakovic", "appearances": "7", "savePercentage": "78.12%", "cleanSheets": "28%", "club": "Dinamo Zagreb"}, "playersMostAppearances": {"players": [{"playerName": "Dominik Livakovic", "playerDob": "Jan 9, 1995", "position": "GK", "club": "Dinamo Zagreb", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Josko Gvardiol", "playerDob": "Jan 23, 2002", "position": "DF", "club": "RB Leipzig", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Luka Modric", "playerDob": "Sep 9, 1985", "position": "MF", "club": "Real Madrid", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Mateo Kovacic", "playerDob": "May 6, 1994", "position": "MF", "club": "Chelsea", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Mario Pasalic", "playerDob": "Feb 9, 1995", "position": "MF", "club": "Atalanta", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Lovro Majer", "playerDob": "Jan 17, 1998", "position": "MF", "club": "Rennes", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Ivan Perisic", "playerDob": "Feb 2, 1989", "position": "FW", "club": "Tottenham", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Andrej Kramaric", "playerDob": "Jun 19, 1991", "position": "FW", "club": "Hoffenheim", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "appearances": 7}, "playersMostDuelsWon": {"players": [{"playerName": "Bruno Petkovic", "playerDob": "Sep 16, 1997", "position": "FW", "club": "Dinamo Zagreb", "brandSponsorAndUsed": "Nike", "appearances": 6}], "duels": 14.48}, "playersMostInterception": {"players": [{"playerName": "Josko Gvardiol", "playerDob": "Jan 23, 2002", "position": "DF", "club": "RB Leipzig", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "interceptions": 1.57}, "playersMostSuccessfulTackles": {"players": [{"playerName": "Mateo Kovacic", "playerDob": "May 6, 1994", "position": "MF", "club": "Chelsea", "brandSponsorAndUsed": "Nike", "appearances": 7}], "successfulTackles": 3.38}}
{"teamName": "Morocco ", "teamTotalGoals": "2", "nationalTeamKitSponsor": "Puma", "topScorers": {"players": [{"playerName": "Youssef En-Nesyri", "playerDob": "Jun 1, 1997", "position": "FW", "club": "Sevilla", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "goals": 2}, "bestPassers": {"players": [{"playerName": "Achraf Hakimi", "playerDob": "Nov 4, 1998", "position": "DF", "club": "PSG", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Yahia Attiyat Allah", "playerDob": "Mar 2, 1995", "position": "DF", "club": "Wydad AC", "brandSponsorAndUsed": "Nike", "appearances": 6}, {"playerName": "Hakim Ziyech", "playerDob": "Mar 19, 1993", "position": "FW", "club": "Chelsea", "brandSponsorAndUsed": "Nike", "appearances": 7}], "goalAssists": 1}, "bestDribblers": {"players": [{"playerName": "Abderrazak Hamdallah", "playerDob": "Dec 17, 1990", "position": "FW", "club": "Al-Ittihad", "brandSponsorAndUsed": "Nike", "appearances": 4}], "dribbles": 4.22}, "goalKeeper": {"playerName": "Yassine Bounou", "appearances": "6", "savePercentage": "58.33%", "cleanSheets": "50%", "club": "Sevilla"}, "playersMostAppearances": {"players": [{"playerName": "Achraf Hakimi", "playerDob": "Nov 4, 1998", "position": "DF", "club": "PSG", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Sofyan Amrabat", "playerDob": "Aug 21, 1996", "position": "MF", "club": "Fiorentina", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Selim Amallah", "playerDob": "Nov 15, 1996", "position": "MF", "club": "Standard Liege", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Azzedine Ounahi", "playerDob": "Apr 19, 2000", "position": "MF", "club": "Angers", "brandSponsorAndUsed": "Adidas", "appearances": 7}, {"playerName": "Hakim Ziyech", "playerDob": "Mar 19, 1993", "position": "FW", "club": "Chelsea", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Sofiane Boufal", "playerDob": "Sep 17, 1993", "position": "FW", "club": "Angers", "brandSponsorAndUsed": "Nike", "appearances": 7}, {"playerName": "Youssef En-Nesyri", "playerDob": "Jun 1, 1997", "position": "FW", "club": "Sevilla", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "appearances": 7}, "playersMostDuelsWon": {"players": [{"playerName": "Youssef En-Nesyri", "playerDob": "Jun 1, 1997", "position": "FW", "club": "Sevilla", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "duels": 8.23}, "playersMostInterception": {"players": [{"playerName": "Achraf Dari", "playerDob": "May 6, 1999", "position": "DF", "club": "Brest", "brandSponsorAndUsed": "Nike", "appearances": 3}], "interceptions": 2.89}, "playersMostSuccessfulTackles": {"players": [{"playerName": "Achraf Hakimi", "playerDob": "Nov 4, 1998", "position": "DF", "club": "PSG", "brandSponsorAndUsed": "Adidas", "appearances": 7}], "successfulTackles": 3.84}}
{"teamName": "Netherlands ", "teamTotalGoals": "3", "nationalTeamKitSponsor": "Nike", "topScorers": {"players": [{"playerName": "Cody Gakpo", "playerDob": "May 7, 1999", "position": "FW", "club": "PSV", "brandSponsorAndUsed": "Puma", "appearances": 5}], "goals": 3}, "bestPassers": {"players": [{"playerName": "Denzel Dumfries", "playerDob": "Apr 18, 1996", "position": "DF", "club": "Inter", "brandSponsorAndUsed": "Adidas", "appearances": 5}, {"playerName": "Davy Klaassen", "playerDob": "February 21, 1993", "position": "MF", "club": "Ajax", "brandSponsorAndUsed": "Nike", "appearances": 4}], "goalAssists": 2}, "bestDribblers": {"players": [{"playerName": "Memphis Depay", "playerDob": "Feb 13, 1994", "position": "FW", "club": "Barcelona", "brandSponsorAndUsed": "Puma", "appearances": 2}], "dribbles": 3.88}, "goalKeeper": {"playerName": "Andries Noppert", "appearances": "5", "savePercentage": "81.82%", "cleanSheets": "40%", "club": "Heerenveen"}, "playersMostAppearances": {"players": [{"playerName": "Andries Noppert", "playerDob": "Apr 7, 1994", "position": "GK", "club": "Heerenveen", "brandSponsorAndUsed": "Nike", "appearances": 5}, {"playerName": "Daley Blind", "playerDob": "Mar 9, 1990", "position": "DF", "club": "Ajax", "brandSponsorAndUsed": "Adidas", "appearances": 5}, {"playerName": "Virgil van Dijk", "playerDob": "Jul 8, 1991", "position": "DF", "club": "Liverpool", "brandSponsorAndUsed": "Nike", "appearances": 5}, {"playerName": "Denzel Dumfries", "playerDob": "Apr 18, 1996", "position": "DF", "club": "Inter", "brandSponsorAndUsed": "Adidas", "appearances": 5}, {"playerName": "Nathan Ake", "playerDob": "Feb 18, 1995", "position": "DF", "club": "Manchester City", "brandSponsorAndUsed": "Nike", "appearances": 5}, {"playerName": "Frenkie de Jong", "playerDob": "May 12, 1997", "position": "MF", "club": "Barcelona", "brandSponsorAndUsed": "Nike", "appearances": 5}, {"playerName": "Maarten de Roon", "playerDob": "Mar 29, 1991", "position": "MF", "club": "Atalanta", "brandSponsorAndUsed": "Adidas", "appearances": 5}, {"playerName": "Teun Koopmeiners", "playerDob": "Feb 28, 1998", "position": "MF", "club": "Atalanta", "brandSponsorAndUsed": "Adidas", "appearances": 5}, {"playerName": "Cody Gakpo", "playerDob": "May 7, 1999", "position": "FW", "club": "PSV", "brandSponsorAndUsed": "Puma", "appearances": 5}], "appearances": 5}, "playersMostDuelsWon": {"players": [{"playerName": "Luuk de Jong", "playerDob": "Aug 27, 1990", "position": "FW", "club": "PSV", "brandSponsorAndUsed": "Adidas", "appearances": 1}], "duels": 14.46}, "playersMostInterception": {"players": [{"playerName": "Xavi Simons", "playerDob": "Apr 21, 2003", "position": "MF", "club": "PSV", "brandSponsorAndUsed": "Nike", "appearances": 1}, {"playerName": "Noa Lang", "playerDob": "Jun 17, 1999", "position": "FW", "club": "Club Brugge", "brandSponsorAndUsed": "Adidas", "appearances": 1}], "interceptions": 12.86}, "playersMostSuccessfulTackles": {"players": [{"playerName": "Teun Koopmeiners", "playerDob": "Feb 28, 1998", "position": "MF", "club": "Atalanta", "brandSponsorAndUsed": "Adidas", "appearances": 5}], "successfulTackles": 4.15}}

We set a String instead of a Int for the field teamTotalGoals and we passed the character - in the second line.

When we ingest this file in BigQuery, the operation fails with an error, because the character - can’t be converted to a Int

2.2.1 Execute the Python script without error handling

We firstly execute the script without error handling in the file insert_world_cup_stats.py :

from google.cloud.bigquery import job

from world_cup_stats.load_world_cup_players_stats_bigquery import load_world_cup_players_stats_bigquery, \
input_players_stats_with_error_domain_file_path


def insert_world_cup_stats(input_player_stats_domain_file: str):
load_job: job.LoadJob = load_world_cup_players_stats_bigquery(input_player_stats_domain_file)

load_job.result()


if __name__ == '__main__':
insert_world_cup_stats(input_players_stats_with_error_domain_file_path)

print("Data was inserted correctly to BigQuery")

In the BigQuery Python script, we call the result() method from the LoadJob object instance. This method waits for the result of the operation.

The logic of the ingestion is factorized in a separate file, because it will be invoked in two different scripts, this one and the other with error handling.

It corresponds to a load job on BigQuery, the file name is load_world_cup_players_stats_bigquery.py :

import pathlib
from typing import List, Dict

from google.cloud import bigquery
from google.cloud.bigquery import job

from world_cup_stats.file_loader import load_nd_json_file_as_string
from world_cup_stats.root import ROOT_DIR
from world_cup_stats.team_player_stats_with_fifa_ranking_mapper import to_stats_domain_dicts_with_fifa_ranking

input_players_stats_domain_file_path = f'{ROOT_DIR}/files/world_cup_players_stats_domain_ndjson.json'
input_players_stats_with_error_domain_file_path = (
f'{ROOT_DIR}/files/world_cup_players_stats_domain_with_errors_ndjson.json'
)


def load_world_cup_players_stats_bigquery(input_player_stats_domain_file: str) -> job.LoadJob:
input_teams_fifa_ranking_file_path = f'{ROOT_DIR}/files/team_fifa_ranking.json'

# Given.
teams_players_stats_domain_as_byte = (
load_nd_json_file_as_string(input_player_stats_domain_file)
.encode('utf-8')
)

teams_fifa_ranking_as_byte = (
load_nd_json_file_as_string(input_teams_fifa_ranking_file_path)
.encode('utf-8')
)

teams_player_stats_domain_with_fifa_ranking: List[Dict] = to_stats_domain_dicts_with_fifa_ranking(
teams_players_stats_domain_as_byte,
teams_fifa_ranking_as_byte
)

current_directory = pathlib.Path(__file__).parent
schema_path = str(current_directory / "schema/world_cup_team_player_stat_schema.json")

bigquery_client = bigquery.Client()
schema = bigquery_client.schema_from_json(schema_path)

job_config = bigquery.LoadJobConfig(
create_disposition=bigquery.CreateDisposition.CREATE_NEVER,
write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
schema=schema,
source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
)

table_id = "qatar_fifa_world_cup.world_cup_team_players_stat"

return bigquery_client.load_table_from_json(
json_rows=teams_player_stats_domain_with_fifa_ranking,
destination=table_id,
job_config=job_config
)

The result is :

google.api_core.exceptions.BadRequest: 
400 Error while reading data,
error message: JSON table encountered too many errors, giving up.
Rows: 2; errors: 1.
Please look into the errors[] collection for more details.

As we explained previously, natively without error handling, the errors are not clear and the stacktrace indicates that it is necessary to access the errors collection for more details.

2.2.2 Execute the Python script with error handling

We execute the second script but with error handling insert_world_cup_stats_with_error_handling.py :

from google.api_core.exceptions import ClientError
from google.cloud.bigquery import job

from world_cup_stats.load_world_cup_players_stats_bigquery import load_world_cup_players_stats_bigquery, \
input_players_stats_with_error_domain_file_path


def insert_world_cup_stats_with_error_handling(input_player_stats_domain_file: str):
load_job: job.LoadJob = load_world_cup_players_stats_bigquery(input_player_stats_domain_file)

try:
load_job.result()
except ClientError as e:
print(load_job.errors)
raise e


if __name__ == '__main__':
insert_world_cup_stats_with_error_handling(input_players_stats_with_error_domain_file_path)

print("Data was inserted correctly to BigQuery")

We added a try/except bloc on the ClientError exception.

With this exception type and the except block, we can access to the errors collection in the LoadJob instance object : load_job.errors

In this example, we print the errors collection in the console to show the errors are clear and very understandable.

Developers can apply the best and needed error handling logic based on the errors collection. The print was used for simplicity here.

The result of the errors collection is :

[
{
'reason': 'invalid',
'message': 'Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 2; errors: 1. Please look into the errors[] collection for more details.'
},
{
'reason': 'invalid',
'message': 'Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 2; errors: 1; max bad: 0; error percent: 0'
},
{
'reason': 'invalid',
'message': 'Error while reading data, error message: JSON parsing error in row starting at position 2833: Could not convert value \'string_value: \t "-"\' to integer. Field: teamTotalGoals; Value: -'
}
]

The last object in the Json array, explains clearly the error on the teamTotalGoals field :

Could not convert value \'string_value: \t "-"\' to integer. Field: teamTotalGoals; Value: -

We didn’t show the code of the business transformations, because the most important was to be focused on the ingestion logic and error handling.

Conclusion

This article showed how to have better error handling with the BigQuery Python client.

We wanted to show the execution of the Python script with the native approach without error handling, which results in not clear errors.

We then replaced it with a script containing an error handling, which gives access to the errors collections and very clear errors.

We really recommend adding error handling, to save time, have a better understanding of errors and to have a more robust code.

All the code shared on this article is accessible from my Github repository :

If you like my articles, videos and want to see my posts, follow me on :

--

--

Mazlum Tosun
Google Cloud - Community

GDE Cloud | Head of Data & Cloud GroupBees | Data | Serverless | IAC | Devops | FP