Calculating Mean, Median and Mode in BigQuery

Straight-forward code to calculate the simple aggregations

Sneha Thanasekaran
Analytics Vidhya

--

BigQuery is a fast-processing analytical tool that processes SQL queries on the Google Cloud Platform. In this article, I will show code examples to calculate the mean, median and mode of a simple dataset in BigQuery. Whenever we start exploratory data analysis, these are the first few metrics to calculate on the numerical fields to understand the distribution of the data.

If you would like to learn more about these 3 summary statistics, you can look into this comprehensive video by Khan Academy.

Calculating Mean, Median and Mode in BigQuery

Dataset

We’ll be using the FIFA World Cup Dataset from Kaggle. This is one of the most downloaded datasets and it contains the results of the football games that occurred every 4 years, starting from 1930. The questions we are interested in are:

  • What is the average number of goals made by the home teams each year?
  • What is the middle value of the goals made by the home teams each year?
  • What is the most frequent number of goals achieved by the home teams in each year?

Note that we will be computing stats for each year the games were played.

--

--

Sneha Thanasekaran
Analytics Vidhya

Data Scientist | Learning Enthusiast with focus on Statistics, Machine Learning and Analytics. Finding new aspirations/dreams and having fun along the way!