Olympic Games Analytics with Snowflake

End-to-End Demo: Snowflake Notebooks, External Access Integration, and Streamlit

Image generated by AI — Bing Creator

Introduction

I’ve always been fascinated by the stats TV commentators share during the Olympics. As a kid, I wondered how they knew so much about every athlete and their stats. I imagined them sifting through heaps of paper for details.

In the meantime, I grew up but I’m still fascinated by numbers and statistics. So, here’s a fun Snowflake demo to get you ready for the upcoming Olympics in Paris!

What better way to anticipate the games than by analyzing statistics from the entire Olympic Games history? Imagine enjoying all these stats while learning some cool stuff in Snowflake! If this sounds good to you, let’s start!

Getting Started

Following these simple steps, you’ll build in your Snowflake account, on top of a complete Olympic dataset, a cozy data exploration dashboard. To build this you’ll explore these three Snowflake features:

  • Streamlit in Snowflake
  • Snowflake Notebook
  • External Access Integration (you’ll do something really interesting here).

All the code is available in this GiHub Repository.

Step 1: Run Setup Script

First, run the setup script in a Snowsight worksheet to create the necessary database objects. This script sets up the database, schema, warehouse, and external access integration needed for the project.

Step 2: Push Data from GitHub to Tables via Notebook

Next, use the provided notebook to fetch the Olympic Games data from GitHub and push it into Snowflake tables. The notebook will handle the data ingestion and table creation.

Yes, you read that right! There is no need to locally save the dataset and then push it to some stage. This time, we’ll use the External Access Integration created in the previous step to pull data from the GitHub folder and load records directly into the Snowflake tables.

  • Download & Import the notebook
  • Import plotly package in the top right “Packages” button.
  • Enable in the Notebook Settings (three dots icon, top right) the External Access for Github created in the previous step.
Run the Notebook in Snowflake to Ingest Olympic Dataset from GitHub

Step 3 Import the Streamlit App into Snowflake

To explore and visualize the data using the Streamlit app, follow these steps:

  • Create a New Streamlit App:
  • Click on New Streamlit.

In the popup page

  • Enter a name for your app, such as “Olympic Games Data Hub”.
  • Choose the Database: OLYMPIC_GAMES
  • Select the Schema: RAW_DATA
  • Choose the Warehouse: OLYMPIC_GAMES_WH
  • Copy and Paste the app code in the app editor (Download the file olympic_games_data_hub.py )
  • NOTE: import plotly package in the top left “Packages” button.
  • Run the App: Click on the Run button to start the app and begin exploring the data!

Olympic Games Data Hub Analytics

We are Ready to Go but I’m planning to integrate the Cortex Analyst feature recently announced as soon as it will be in PuPr. Stay tuned and we’ll build our Olympic Games chatbot on the same dataset!

Olympic Games Data Hub — Streamlit

These are the analytics currently available in the Olympic Games Data Hub. Additional statistics can be produced either in the app or directly in the exploration notebook!

  • Gold Medal Comparison by Country: Compare the gold medal counts across different countries and editions. Visualize the performance of selected countries in a grouped bar chart.
  • Performance Trends by Country: Analyze the performance trends of the selected country over the years, with separate lines for summer and winter editions (Spoiler: USA won 83 gold medals in the 1984 edition, wow!).
  • Olympic Medals Distribution Over Time: View the distribution of gold, silver, and bronze medals over different years. Track the trends in medal counts through interactive line charts.
  • Top Athletes by Medals: Discover the top athletes based on their medal counts. This section highlights the athletes with the most medals across different events (Spoiler: Michael Phelps and Larisa Latynina are quite impressive!).
  • Event Participation Analysis: Examine the number of athletes participating in each edition. Get insights into the participation patterns of Olympic athletes. (Spoiler: 11579 athletes for London 2020 vs less than 250 athletes for the 1986 Edition!).

Credits & Resources

  • Logo and photos used in the Streamlit app are hosted on Wikimedia.
  • All content and visuals provided are for educational and informational purposes. Please adhere to copyright and usage guidelines when sharing or using the data and images.
  • The dataset is provided under a public license from Kaggle; Data was web-scraped from Olymedia.org.
  • GitHub Repository

If you have any questions or suggestions or if you happen to discover a bug — I’m sure there might be a few well-hidden, please feel free to post them in the comments below or reach out to me on LinkedIn.

--

--

Matteo Consoli
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Sales Engineer @ Snowflake ❄️ | Data Engineering | Data Analytics | Data Science | Python | SQL | Book | Music Composer 🎼