2020 Wrapped: A Year in Data

Jack Gorman
Analytics Vidhya
Published in
8 min readJan 14, 2021

--

2020 Fitness Data

2020.What a strange year. Very few words can describe the events of the past year but data can. Inspired by the popular Spotify Wrapped, I built my own version last year. Every day we are producing data and often without our knowledge. Technology companies monetize our data for better or worse. I decided to tap into this untapped power to analyze my data for the past year. Extracting data is simple through sites such as Instagram, Spotify, Strava, and Netflix. Manual tracking (drinking, traveling, reading) is also helpful. My tools were Python, for data manipulation and machine learning, and Power BI, for visualization. Through data, I discovered a personal perspective of the dumpster fire of a year that was 2020.

2020 Travelling

Source: Manual Data Collection via Excel

Summary: I did not think that I would have had the travel schedule I did, but who could have predicted COVID? Luckily, before Covid shut the world down, I was able to squeeze a month trip to Asia. I explored four new countries: Vietnam, Cambodia, Thailand, and the United Arab Emirates. Despite travel restrictions, I was able to travel home to Chicago and visit friends in Charleston, Nashville, and Kiawah Island.

Insights: I spent 68% of the year in Atlanta. Before COVID, I was expecting this number to be less than 40%

Goal For 2021: Visit 4 new cities and get out of the country once

2020 Fitness Data

Source: Web scraping Strava and data manipulation via python

Summary: Throughout the craziness of 2020, working out kept me sane. I imagine that 70% of the days I only left my apartment to workout. During the heart of quarantine, I did the Insanity program and would go on a run at night. Since I wasn’t training for anything this year, most of my runs were on the shorter side.

Insights: While I would have thought that the early months of the pandemic would have caused me to not work out, they were the months I did the most. The main reason I believe this happened is due to sticking to a workout training plan.

Goal for 2021: Go on longer runs, rides. Find a training plan for the year. Train for a road race or triathlon if things go back to normal.

2020 Social Media

Source: Users can download their personal data from any of the big social media platforms. Python for data processing and the textblob package for sentiment analysis.

Summary: It is quite astonishing what data social media companies possess on their users. Tech platforms record every click, search, and like to keep users hooked on their platforms. I thought this exercise was useful as it allowed me to understand my social media habits. The data was not as interesting as I would have liked since I do not like posts on social media often. To gather better data, I am going to start liking more posts on these platforms.

Insights: The prime time for me to like a picture on Instagram is between 7:00 pm to 9:00 pm. The average subjectivity of my tweets is .36/1 and the average polarity (positive/negative) is +.14.

Goal for 2021: Decrease time on social media (hard to track). Increase the number of Twitter likes by 150%

2020 YouTube Data

Source: YouTube Data Download and Python Manipulation

Summary: One of the best things about the internet is YouTube. YouTube (and all Google products) allows you to download your data. While you can't see how long you watched a video, you can tell the number of times you clicked on a video. In 2019, I wasted too much time on YouTube watching 3871 videos. This year, I decreased the videos I watched by 48%. Some of the top channels remained (SNL, Yes Theory, ONE Media), but there was an increase in data science, meditation, and fitness videos. Last year’s data started a positive trend to consciously change how I interacted with YouTube. 2020’s data will only strengthen the foundation of building good habits.

Insights: 11% of the videos I watched included the word "trailer" signifying some movie or TV show. The next big categories were "SNL" and "data" with around 3% each.

Goal For 2021: Fewer junk videos and more educational YouTube videos.

2020 Reading

Source: Manual Google sheet kept throughout the years

Summary: I have always made time to read. Even with the introduction of a busy work schedule, this did not change. Besides the two rereads of Man’s Search for Meaning and The Great Gatsby, other favorites for the year included How to Take Smart Notes, The Man Who Solved the Market, and Anathem.

Insights: Science fiction books had the highest rating at 4.71. Business books had the lowest rating at 3.36. This strengthens my understanding that most books on the subject are trash!

Goal For 2021: Have an average rating of 4.0. Read 12 audiobooks.

Eating Out for 2020

Source: Credit Card Transactional Data via Mint and Python Data Manipulation

Summary: It was a weird year for eating out. The restaurant industry was one of the hardest hit by COVID. I define eating out as anytime I order delivery, get a coffee, or go to a restaurant/brewery. It should not be shocking that UberEats would be my top choice during a year filled with being unable to go out. Compared to last year, I saw a decrease in the number of times I ate out. Granted my eating out on average has gotten a bit more expensive per order.

Insights: 35% of the restaurants I ate at were new. I would like to see that increase for next year.

Goal for 2021: 40% of the restaurants I eat at are new. Decrease the average spend per meal.

Source: Python Web Scraping, Manual Tracking

2020 Streaming

Summary: As a consequence of COVID and sheltering in place, my television time increased. I tend to not watch television, but COVID caused this to increase. I enjoyed all the shows I watched and can’t wait for the new seasons of them to come out (The Witcher, Ted Lasso, The Boys, etc.)

Insights: Netflix dominated my list, which surprised me. At 64% of my personal market share, Netflix has the best show selection. 2021 can be the year where other streaming services finally catchup.

Goal: Find 3 new shows to watch.

2020 Sleeping

Source: CSV downloads from AutoSleep, Apple Health, and Sleep Cycle. Python manipulation to combine datasets.

Summary: After reading Why We Sleep a few years ago, I made sleep a priority. This year I decided to start tracking sleep to help me perform better. The earlier days of the week I tend to get a lot less sleep, which appears to cause some issues. By tracking my sleep, I was more conscious to increase the time I slept each day.

Insights: Average sleep was the lowest when COVID first hit in March (THE first two month’s data is garbage). I am not quite sure how the “Sleep Quality” for both apps is determined but it seems strange to me.

Goal: Increase sleep quality by 5%.

Drinks for 2020

Source: Chatbot SMS tracking tied to a google sheet

One of the projects I worked on this year was building a chatbot that would help me track metrics easier via SMS. I decided that every time I had a drink (water, coffee, alcohol, etc.) I would log a drink. This was a bit hard at times (especially with alcohol) but thought it was effective in understanding some habits.

Insights: I realized that my coffee consumption was higher than expected after 2 pm. Due to this, I only have one coffee after lunch now.

Goals: Increase consumption of water by 25%

Source: Spotify Data, Spotify Web API + Python Data Manipulation

Summary: I transitioned from Apple Podcasts to Spotify at the end of September. With Spotify, I can track my podcast consumption (It is difficult to retrieve Apple’s data for better or worse). I devour podcasts so it was exciting to look at my podcast behavior. I enjoy interview podcasts such as The Tim Ferris Show, Invest Like The Best, and The Rich Roll Podcast.

Insights: Podcast consumption is inverse to work hours. Consumption decreases by weekday with Friday being an outlier.

Goal: Discover new 3 podcasts.

Source: Spotify Data, Spotify Web API, Python for data manipulation, and Sklearn for K- Nearest Neighbors

Summary: Everyone loves to share their Spotify Wrapped with all their friends. What lies underneath that Instagram story you share? I decided to take a deeper dive into the data to get a better sense of how the sausage is made. Spotify offers a fantastic API to retrieve data around specific song’s characteristics. Features include tempo, energy, speechiness, and other characteristics. I implemented the K-Nearest Neighbors clustering algorithm to group together all the songs I had played.

Insights: I spend on average 2.91 hours a day listening to Spotify. That’s a lot of time! The hours I listened to Spotify lined exactly up with my work schedule. Drastic breaks around lunch and dinner are also shown in the bottom left chart as well. Another interesting insight is the tempo. The average tempo of my songs was 116. The songs I played the most had a tempo around 100 suggesting that I may prefer slower-paced songs.

Goal: Diversify music

Conclusion

2020 was one crazy year. Taking the time at the end of the year to analyze my data is fantastic but there is more to be achieved. One of the big initiatives I have for 2021 would be to have more continuous monitoring around my data. As Peter Drucker said, “What gets measured gets managed.” Without proper goals and systems in place, what is the point in tracking data? I want to install more automatic monitoring of my data to reach the goals I have for 2021. Through automation, I can increase my time for more interesting analytical experiments. I would like to build the “One ring to rule them all” for my data. I want to centralize my data sources to find correlations and discover data-driven recommendations on my habits. By unifying my data, the potential to increase my performance is limitless. Thanks for reading until the end and I hope that 2021 treats us better than 2020!

--

--