Know yourself Series—

Spotify Music Data Analysis: Part 1

Data Gathering

Pragya Verma
Analytics Vidhya

--

Everyone listens to music all day. Even I am hooked to music. I need music no matter which activity I do. I have an eclectic taste in music, the genres I listen to vary from dance music with a high tempo to sweet mellow acoustic music.

Being an analyst what’s a fun way to investigate anything —

Let’s quantify the data.

In this article series, I am exploring my music streaming history and will infer my varied music taste. With the help of this analysis, I will have a much better understanding of my listening taste and habits.

For this analysis, I will be fetching my data from Spotify as I have been using it for the past two years and haven’t change the platform since. Hence all of my listening histories are present in Spotify and I can easily assemble my dataset that is required to study my music taste.

In this part of the series, we will scrape the data using Spotify API and then perform data cleaning and preprocessing so that our results are accurate.

Table of Contents

  1. Introduction to Spotify
  2. Downloading the Data from Spotify Dashboard
  3. Extracting Relevant Data
  4. Conclusion
  5. Links to other parts of this series

Introduction to Spotify

Spotify is a music streaming and media service where you can listen to worldwide pieces of music and podcasts. Spotify provides its services for free with some limitations that are reserved for premium paid users.

Downloading the Data from Spotify Dashboard

Now to download your Spotify data the first and foremost step is to login into your Spotify account. You can do this by visiting the official Spotify's website.

Once logged in, go to the top right corner of the screen. You will see your Profile section. Select your Account from the dropdown menu.

Navigation List in your Spotify account dashboard

You will be routed to your account dashboard. Here you can navigate to the menu pane on the left side of the screen. From the navigation menu as shown beside, select Privacy Settings.

After selecting Privacy settings, scroll down to the bottom of the page. Here you can find the Download your Data section as shown below.

Spotify page — Download the Data

Click on the request button to initiate the process of data gathering. Spotify usually takes 4–5 days to email your data. However, they can sometimes take up to 30 days.

Once you receive a mail from Spotify, download this zip folder and extract its contents. All of your information provided by Spotify is in JSON format.

Extracting Relevant Data

Once you have extracted the zip folder provided by Spotify, you will have access to information such as streaming history, personal details, and artists following, etc. The list of JSON files in the zip folder are as follows:

  1. Follow.json — contains the following list as well as the current followers.
  2. Identity.json contains your information that is shown on the Spotify app such as name, photo, and verification etc.
  3. Inferences.json contains Spotify’s understanding of you as a user i.e., what kind of content you consume on Spotify like education, business, dance, and so on.
  4. Payments.json contains your payment information.
  5. Playlist1.json — This JSON has the playlists information that is created by you.
  6. SearchQueries.json — It stores your search history for example: at what time and on which system you have queried for a song, artist or podcast.
  7. StreamingHistory0.json — This has your streaming history i.e., when and which song you have heard and for how long.
  8. UserData.json — This contains the personal information that you provide at the time of sign up, for instance, your username, date of birth, email, gender, etc.
  9. YourLibrary.json —The content that you have saved or liked can be found in this JSON file.

However, the JSON does not contain information like the date-music timestamp, the song id, song features, and other such info. So, to capture this information you can use a code written by Vlad Gheorghe. You can check out his data scraping code in his blog and Github.

The first few rows of the data scraped with Vlad’s code are as follows:

Moreover, I have also gathered the playlist information. The code I used here was written by Vinci Hu. You can also check out her blog and GitHub for the same.

The first few lines of the playlist data scraped from Vincy’s code are as follows:

You can find the aforementioned python script for data scraping in my Github as well.

Conclusion

The data is scraped using Spotify API.

Now, in the next part of the series, I will check for abnormalities in the data and then preprocess the dataset.

--

--

Pragya Verma
Analytics Vidhya

Data professional focused on end-to-end solutions, exploring data analytics and engineering to unlock data’s potential.