YouTube Channel Analysis

Analysing the top Youtube channel ‘T-Series’ using Python

Ramesh Kasotiya
5 min readOct 10, 2020
Image source: PHD Media

Have you ever wondered how YouTube Analytics works? How any Youtube channel is growing? How to check your channel’s progress? How YouTube shows trending videos? How to check the statistics (i.e subscribers, likes, dislikes, views, comments, etc.) of any channel? How websites like Tubics, Social Blade, etc. analyse any YouTube channel?

No? Don’t worry. ✌️ I have made a small project based on it. Click here YouTube Channel Analysis for complete source code and to check how it works. You’ll get an idea about what’s happening behind.
Note: Understanding of Python programming language is required.

It’s obvious that we need data to analyse something. So, we download the data from T-Series : the top YouTube channel, using YouTube Data API and Python. Then we prepare our dataset in CSV format and remove the unwanted data. Once we’ve prepared and cleaned our dataset, we can use it for analysis and visualisation of our YouTube channel.
Let’s start-

Downloading the Dataset

We’re going to gather some data from T-Series Youtube channel using Youtube Data API, JSON, Python and its library Pandas. Here is the complete guide to do the same -

Guide for data scraping from YouTube channel using Python

If we follow this guide correctly, we’ll have our data stored in a Pandas DataFrame named as data_df
Let’s convert this data to CSV file format using following command -

data_df.to_csv(‘tseries.csv’, index=False)

Now we have “ tseries.csv ” file in our current directory. This is our raw dataset.

Let’s extract number of subscribers and videos from the dataset using following code -

Check source code for more information

Data Preparation and Cleaning

Now, we’ll remove the unwanted data, will make the dates readable, will extract the information (date, time, day, month year) from our raw dataset and then we’ll store them in separate columns. These are the steps to do the same -

Step 1
Store our raw data from CSV file to a new DataFrame for processing -

tseries_raw_df = pd.read_csv('tseries.csv')
This our extracted data from T-Series YouTube Channel

Step 2
Remove unwanted data

tseries_df = tseries_raw_df.drop(['channel_id','video_id'], inplace=False,axis=1)
This is our required data with unusual date format

Step 3
Now we’ll make published date and time more readable using following code

Step 4
Now we’ll separate day, month, year, date and time from the published_date column using following code

This “ tseries_df ” is our prepared and cleaned dataset which we’ll use for analysis and visualisation purpose -

Prepared and cleaned dataset

Exploratory Analysis and Visualisation

Now we use Python and its plotting libraries Seaborn and Matplotlib to analyse and visualise the different relationships among channel’s statistics parameters. To analyse the dataset and talk about its interesting points, I’m using an already downloaded dataset, so when you’re checking this, you may find the information old or not up to date. You can use your latest downloaded dataset for analysis and visualisation.

Relationship among statistics parameters using Pie Charts

Insights:

  • We can see that 99.44% of the people don’t even react on T-Series videos. Only a tiny percentage of people like, dislike or comment on this channel’s videos.
  • 87.19% people likes videos on this channel according to the reacters.
  • 8.61% people don’t like videos on this channel.
  • People who comments on T-Series videos are less than 4.20% as someone can comment multiple times.

Relationship among statistics parameters using Histograms

Insights:

  • T-Series has 155 million subscribers but only around 20% subscribers watch its videos or may be less than that as some of the viewers not even subscribed the channel.
  • Average number of likes, dislikes and comments on videos are negligible with respect to the number of subsribers and Viewers(Figure 1&2).
  • We can see the ratio of average number of likes, dislikes and comments on each video of T-Series.

Month-wise statistics of T-Series channel

Month-wise number of uploaded videos

Insights:

  • T-Series uploads highest number of videos in month of ‘May’ which is two-three times more than videos being uploaded in other months.
  • T-Series uploads lowest number of videos in month of ‘June’.

Month-wise statistics using scatterplots

Insights:

  • T-Series uploaded its most viewed video in month of ‘November’.
  • T-Series uploaded its most liked video in month of ‘November’.
  • T-Series uploaded its most disliked video in month of ‘March’.
  • T-Series uploaded its most commented video in month of ‘August’.

Year-wise Statistics of T-Series channel

Year-wise number of uploaded videos

Year-wise statistics using scatterplots

Insights:

  • T-Series started uploading videos on its channel, in the year ‘2011’.
  • T-Series uploaded its most viewed video in the year ‘2018’.
  • T-Series uploaded its most liked video in the year ‘2018’.
  • T-Series uploaded its most disliked video in the year ‘2019’.
  • T-Series uploaded its most commented video in in the year ‘2020’.

Top 10 most viewed videos of T-Series

10 most viewed videos and their statistics

Top 10 least viewed videos of T-Series

10 least viewed videos and their statistics

Most famous video of T-Series

Most viewed and liked video, and its statistics

Most commented video of T-Series

Most commented video and its statistics

Most disliked video of T-Series

Most disliked video and its statistics

Recently uploaded videos of T-Series

10 recently uploaded videos and their statistics

Initially uploaded videos of T-Series

10 oldest videos and their statistics

Corona pandemic effect on T-Series YouTube channel

According to the yearwise statistics, T-Series has uploaded 58 videos till now in year 2020 which is higher than the total number of videos uploaded in year 2019. Also, channel is doing good in terms of views, likes and comments in this year so I think they are able to manage the channel in this pandemic with their music content. Although they are not able to create much new video content because of this situation.

Inferences and Conclusion

In this project, we extracted YouTube channel T-Series’ videos information using Youtube API, Python, JSON and requests libraries. We prepared our csv dataset using it. We cleaned this raw dataset, performed some operations to make it more convenient to read and analyse. Then we explored the dataset and visually analysed different relationships among time, subscribers, views, likes, comments, dislikes etc. Although we used some data only for this project but one can download any channel’s whole data by using her/his API appropriately and then use it for complete analysis purpose. As we analysed one channel, YouTube analyses all channels using its own internal mechanism. It uses various factors like which videos are quickly becoming famous, how they are performing w.r.t. their channel’s other videos, video is healthy or not for the community etc. Using all the factors and filters YouTube shows trending videos and updates this list every 15 minutes.

I hope you enjoyed this blog and found it useful. Check out the complete project and the source code for more information 👉 YouTube Channel Analysis.
Thank you for your time. I hope you learned something. If you like my work and wanna do something in return to the value your gained, you can buy me a coffee :) [click here]

--

--