Exploratory Data Analysis in R: Tokyo 2020 Olympics

Albernazgui
5 min readNov 26, 2021

--

I have a Sports Science background and since I was a kid, the Olympic Games and the FIFA World Cup were two competitions I was always looking forward to. Fast forward a few years and now I am learning Data Analytics, so I decided to practice my R Programming Skills and analyze information from the Tokyo 2020 (2021) Olympic Games.

I wanted to use R to discover information about which countries had more athletes participating, which sports had more athletes participating, who were the countries that won more medals, what sports had more male and female athletes, and which ones had the bigger gender difference gap.
To do this, I’ve downloaded a database I’ve found on Kaggle (here) and started playing with it. If you want to see my analysis in detail and the code running, you can check it out here

Data Preparation

After importing the Athletes, Gender, and Medals .csv files into R, I used the clean_names function from the Janitor library to make all lower letters and replace spaces with underscores:

athletes <- clean_names(Athletes)
gender <- clean_names(Gender)
medals <- clean_names(Medals)

The Analysis

I used the ggplot and dplyr packages for this analysis. So to begin with, let’s see how many countries, sports, and athletes were participating in this Games:

code for counting athletes, sports, and countries participating
11085 athletes from 206 countries, and 46 different sports

Let’s see what were the top 15 countries per number of athletes participating:

Then, let’s see the top 15 sports per number of athletes participating:

Let’s talk medals now… How many medals were won in total at the Olympics?
First, I needed to convert the medals to numbers, sum, get percentages, round, and reorganize columns:

Conversion of data types
Creation of % total column, and reordering columns

Now, let’s see the top 15 countries by number of total medals won:

And when it comes to percent of total medals won:

And now, let’s check the champions! Take a look at bronze, silver, and gold medals won:

code for bronze medals’ plot
code for silver medals’ plot
code for gold medal’s plot

Now let’s take a look at the gender differences between male and female athletes (note: there are other genders but the database is divided only in male and female athletes, and that is why I analyzed it this way).

First, I had to convert data to numeric, create gender difference count, and percentages:

Which sports had more and fewer male athletes participating in Tokyo 2021?

Now, let’s check which sports had more and fewer female athletes participating:

Let’s examine the gender gap between sports, first in numbers of athletes, then in percentage:

code to visualize the number of female athletes compared to male athletes
code to visualize the percentage of female athletes compared to male athletes

And to finish, let’s take a look at the number of male athletes per female in the top 15 sports with the biggest gender gaps:

Key Takeaways

  • The USA was the best country competing in the Olympics, winning more gold, silver, bronze, and total medals than the others;
  • China was the second-best team, achieving a second place on gold, silver, bronze, and total medals won;
  • 1/4 of total medals were won by the USA, China, and Russia;
    - Athletics was the group of sports with more athletes (both male and female), followed by Swimming and Football;
  • Athletics is also the sport with the highest difference in numbers between men and women athletes (103 more men athletes);
  • In percentage, Wrestling is the most unequal sport, having more than 2 men per woman athlete at the Olympics;
  • Only four sports had more women athletes than men athletes (Rhythmic Gymnastics and Artistic Swimming are sports where only female athletes compete on the Olympics);
  • The top 5 most unequal sports in Tokyo 2020 were Wrestling, Cycling Road, Boxing, Equestrian, and Baseball/Softball. All of these had more than 1.5 men for each woman athlete competing

Final Acknowledgements

To compete in a high-performance sport is no easy task. Athletes face though challenges on a daily basis and have their results measured on one competition, one moment, one fraction of a second. Shout out to all athletes training and competing worldwide, and thank you for being an inspiration!

This is my second analysis using R and there are lots of improvements to be done. If you have suggestions or feedback, please reach out to me.

--

--

Albernazgui

Adventure and travel lover. Exploring trails and new places one trip at a time. I write about trip planning & how to stay safe ready during adventures