Analyzing My Netflix Data using Python
How much time do I spend watching Netflix?
I was inspired after reading Atomic Habits by James Clear to look into my daily habits one of which is viewing some of my favorite and latest Netflix shows and movies in my free time or for some background noise. I found this great article from data-quest that helped me through this project which I’ve attached below in this article.
I thought it would be fun to use one of my favorite past times as a sort of first step in my project to learn how to build a CI/CD pipeline for my code. First things first you need an application to push code to in order to trigger the pipeline.
If there’s one thing I’ve learned is one of the best ways to learn a new skill is to use something you enjoy doing or are interested in. Today I will walk you through the steps I took to analyze how much I watch one of my favorite shows CW’s “All American”.
Why I Used Python to analyze viewing data?
We could quite possibly use another software such as excel or another free software but depending on how many times you’ve viewed, scrolled past, or even scurried over even small previews of trailers on Netflix it’s incorporated into your viewing data.
It is because of this that it would be best to use python for sifting through the data. Python is great for sorting and filtering through large data sets it is also a pretty beginner friendly language so its worth taking a stab at.
Where did I find my viewing data?
Well where can we find data about our viewing activity and history? Netflix allows you to download a zip file that includes a plethora of data including Content Interaction, Devices in Use, and Search History to name a few.
First I navigated to the Netflix home screen in the top right corner click Account then clicking download personal data. Next I requested my personal viewing data and this took about a day for Netflix to get back to me.
Going through the data
After receiving the zip file from Netflix I then spent about an hour going through the ‘ViewingActivity.csv’ I unzipped and dove right in. This included helpful information such as Titles, Duration, Device Type, and Viewing country to name a few. I realized I spent a lot of time between shows and movies within the drama and sci-fi genre which is to be expected as I love a good tear jerker as well as new intuitive ideas of what life could look like in the future.
Adding my data to Jupyter Notebook
Luckily I already had the Anaconda application so loading up a Jupyter Notebook was quick and easy. I like using Jupyter because it’s easier to use when doing data analysis for the purpose of this project.
Once I had the Jupyter Notebook open I imported the pandas library and read my Netflix data csv into a data frame ‘df’.
Next I ran df.shape in the notebook to see just how many rows and columns we were dealing with.
Making the data pretty
Before getting down to the nitty gritty I wanted to cut out some unnecessary columns for the sake of this project. For this project I only wanted to view how much time I spent watching All American, so I only needed the Start Time, Duration, and Title.
To do this I made use of the df.drop() to pass two arguments the list of items I wished to drop and axis=1 to eliminate the columns.
A Visual Look at How Often I watch “All American”
I utilized %matplotlib to make my chart show up in my Jupyter notebook I made earlier then import matplotlib into the notebook.
The way I went about organizing this chart is by plotting the chart based off a the days of the week which required me to tell the pandas the order of of the chart in days. In order to so this I used pd.Categorical which does this automatically.
Below is the resulting output which neatly displays the needed information below in a pretty chart.
Key Takeaways
Although this was meant to be a sample project for a CI/CD pipeline I thought it would be useful to use it as a way for me to learn something meaningful.
Jupyter notebooks allow you to write python code in separate blocks or cells allowing you to run each block of code individually. It is also a very interactive IDE great for doing data analysis which is why is was great for this project.
Rather than using excel to do this I used python because its better at sorting, filtering, and preparing large datasets. These large datasets are often referred to as “Big Data”.
Coding this is also better when using multiple data files that may need to readily accessed for analyzing data.
Where you Can Find This Great Tutorial
Below I have attached a link to the article that helped me put together this magnificent piece of work. I hope you’ve enjoyed my project stay tuned for me walking through how I’ll build my CI/CD pipeline!
Link: https://www.dataquest.io/blog/how-much-spent-amazon-data-analysis/