The Best Data Storyteller Competition

Jay Chung
USF-Data Science
Published in
4 min readNov 2, 2023

One of the foundational courses in the USF’s MSDS program is Exploratory Data Analysis (EDA) and Data Visualization, taught by Professor Shan Wang. Given a dataset, EDA is the first critical step in analyzing data, where you familiarize yourself with the dataset and perform necessary cleaning. The data visualization step allows you to both uncover hidden insights and present your findings to your audience in a clear manner. Upon completion of the course, the students are equipped with the skills to explore and tell a story with a given dataset.

For the course’s final group project, students were to choose one of three open-source datasets on Kaggle: Airbnb, NBA, and Netflix. After choosing the dataset, each group cleaned, analyzed, and visualized their findings as if they were data scientists at respective companies presenting to business teams.

Students voted on the top presentations for each dataset. This was a rewarding experience, where students both practiced their storytelling skills and learned from each other’s creative approaches on how they sliced and visualized the data.

Below are select visualizations from the winning teams in the competition.

Airbnb Winner: Bassim Eledath, Pranavi Avadhanam, and Sai Vamsi

The winning team for the Airbnb dataset focused on uncovering the factors that contribute to the success of the top-performing properties. They compared the top and bottom 100 properties (performance defined by number of days booked in the next year). When the team analyzed the images in the listing, they discovered a profound insight. The average image size of the top-performing properties was 37% larger than that of the lowest-performing properties. The team brought it home by overlaying the scatterplot on top of a standard MacBook to help the audience visualize how large the images would be on their laptops.

NBA Winner: Mark Lam, Max Sivolella, and Ting Pan

The NBA dataset was rich with regular season and playoff games data from the past 75 years. From the haystack of data, the winning team was able to produce a simple graph that compares the efficiency score of the all-time top players (ranked by the NBA). The NBA team used a swamplot with dynamically-colored points that added a finishing visual touch. The graph also contains labels for interesting outliers, like how Lebron had the best season (according to the efficiency score) at age 33.

Netflix Winner: Irene Garcia, Jiaxuan Ouyang, Varsha Moturi

The Netflix winners first blew the audience away with the crisp color scheme of their graphs and aesthetic presentation that mirrored the Netflix platform. In one of the graphs, they compared the content release times of the two biggest content-producing regions, the US and India. Here, they saw some differences that raised their eyebrows: while Netflix mostly released its content in India on Friday, it released the most content on Thursday in India. The winning team in the Netflix category were able to deduce that this was because the dataset’s release timestamp was based in a US time zone, and India is hours ahead of the US.

In this competition, students not only delved into the nuances of data analysis but also embraced the art of storytelling with data. Each team’s unique approach illuminated the multifaceted nature of data science, leaving a lasting impression on their peers.

--

--

Jay Chung
USF-Data Science

Data + AI Product Manger. I'm passionate about ungatekeeping AI and write about AI for non-technical audience.