YouTube Data in Python

Sudharsan Asaithambi
GreyAtom
Published in
3 min readNov 7, 2017

I have been hunting for weeks for a specific kind of dataset. The dataset I care about. Even though there are thousands of public datasets out there, they all seem to be where I have very little or no stake in.

Even though working through Titanic or German Credit Scoring is a very good flex of your data crunching muscle, sometimes you feel like indulging yourself.

One such indulgence was with FIFA18 players dataset in Kaggle. Having played FIFA for 15 hours a day in madness, working with this dataset has been yummy so far. Having a dataset you know a lot about, will help you test all crazy hypotheses, you have ever dreamed off.

My next attempt in this series is to get some YouTube data into my playground. The objective of this below tutorial is to get the video statistics and show you around Google’s YouTube Data API.

YouTube Data api V3

The YouTube Data api v3 gives us the access to YouTube videos, channels, search, captions, comments and playlists. From a python kernel you can call the Google’s API, store the data in a dataframe and then further analyze it.

For using Google’s API,

  1. You have to register your project as Google Project at your Google Developer account.

2. Goto Library -> Search for YouTube Data API v3 and enable it.

3. You will be given a Developer Key to authenticate your API calls.

3. Install Google API Python Client in your machine using the following command. Type into your command prompt.

pip install --upgrade google-api-python-client

Google has given sample code to use their APIs. The guide explains all the type of api calls it supports.

4. The below code calls the API, stores the data I need in a dictionary.

4. Save this script at your working directory and change the variable DEVELOPER KEY to the developer key you have received.

Now Let’s Get some Imagine Dragons into picture

5. Now, open your Jupyter Notebook and use the above function. The youtube_search function takes in a search query string as a parameter and returns a dictionary. You can store in Pandas DataFrame and start your analysis.

There is so much scope for analysis with the YouTube data especially with the comments data available as an API. I would continue this series with analysis on this data.

Please, comment below on some of the crazy hypotheses YOU have, which can be cleared with this YouTube Data.

--

--