Calculating Gini Coefficient in BigQuery with SQL

The Gini coefficient, also known as the Gini index, is a common econometric tool for measuring inequality of asset distribution.

Here is the query that outputs Gini coefficient for each day given daily non-zero (anonymous) account balances:

It uses 1 — 2B formula from this Wikipedia page https://en.wikipedia.org/wiki/Gini_coefficient, where B is the area under the Lorenz curve:

  • balance * (rank — 1) is the area of the rectangular horizontal slice under the Lorenz curve.
  • balance / 2 is the area of the triangle on the left of the rectangular slice.
  • all slices are then summed: sum((balance * (rank — 1) + balance / 2))
  • count(*) is needed to normalize the x axis to the range 0 to 1
  • sum(balance) is needed to normalize the y axis to the range 0 to 1

The result can be verified on this website http://shlegeris.com/gini.

You can find a comparison of Gini Indexes of the major cryptocurrencies, which used this SQL, in the following article https://cloud.google.com/blog/products/data-analytics/introducing-six-new-cryptocurrencies-in-bigquery-public-datasets-and-how-to-analyze-them.

--

--

--

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Evgeny Medvedev

Evgeny Medvedev

Creator of https://github.com/blockchain-etl, Co-founder of https://d5.ai and https://nansen.ai, Google Cloud GDE, AWS Certified Solutions Architect

More from Medium

Aggregate Vertex AI model training logs in a BigQuery Table

Streaming Data to BigQuery with Dataflow and Updating the Schema in Real-Time

Optimize costs in BigQuery — 9 solutions

How to query the latest partition data in BigQuery