How to Extract Any Artist’s Data Using Spotify’s API, Python, and Spotipy
A quick guide to navigating Spotify’s Web API and accessing their data
As I worked on my web scraper guide, I touched briefly on the different ways we can extract data ourselves if we can’t find a dataset, and one of those ways was through APIs — they are a great source of unique and interesting datasets.
If you are unfamiliar with what an API is and does, I dive a little deeper into them in this article. In short, you can think of them as a shortcut into a web service’s database, providing a set of methods that a programmer can use to send and receive data.
With Spotify, they provide developers access to some of their data regarding playlists, users, and artists through their Web API, which is what I’ll be showing you how to do.
A lot of times, developers will use Web APIs for app integrations, but it can also be used for data extraction and analysis purposes. Here is what I will be walking you through:
- Creating a Spotify developer account.
- Understanding Spotify’s Web API.
- What data we’ll extract.
- What Spotipy is and does.
- How to extract any artist’s data using Python and Spotipy.
Creating a Spotify Developer Account
The first step in using Spotify’s Web API is to create a developer account. Follow these steps to access your developer account to get a client and secret ID.
Step 1: Log in or create an account
You will need to connect a Spotify developer account by logging into your account or creating a new Spotify account.
Step 2: Client ID
When you’ve accessed your developer account, you’ll need to create a client ID, which you can do by pressing the green
CREATE A CLIENT ID button.
You will then get a three-page pop up that you’ll need to fill out and each page will look like this:
Although we’re not trying to create an app, we will need this client ID to access the same data.
- Page one: Give it a name. I named mine “Album Analysis”. The description I put in is “analyzing Spotify artist’s data”, and then I checked the box that says “Desktop App” under “What are you building?”.
- Page two: Choose non-commercial use.
- Page three: Agree to all the terms and conditions and submit the form.
Step 3: Retrieve your client ID and client secret
Inside your developer dashboard, click on the new app you just created. On your app’s dashboard page, you’ll see your client ID on the top left-hand side.
Underneath your client ID, you’ll see “Show Client Secret” in green. If you press that, you’ll get your client secret. Keep these locations in mind as we work towards our data extraction, we will need them to gain access to the data.
Understanding Spotify’s Web API
I would recommend exploring Spotify’s user-friendly documentation and familiarizing yourself with different areas of how it works. Here are some quick links you can dive into:
- Spotify’s Web API documentation
- Web API tutorial
- Web API libraries
- Spotify Web API reference (useful for exploring the data endpoints we have access to and accessing the related documentation)
Spotify’s Web API console
Another great way to explore the different data Spotify has to offer and to understand the Web API is to explore their Spotify Web API console.
This console lets you explore different endpoints through an easy-to-use interface and test the various methods to preview the JSON outputs.
The Data We’ll Extract
For this guide, I want to extract the data from every album and single (no features) that Kehlani has put out. There are a few potential ways to create a dataset using this API.
We could request a list of the artist’s albums and then loop through each album track.
Or, instead, we can use a playlist that I created which includes every album Kehlani has to offer on Spotify and loop through that which would require a little less code.
Feel free to create your own playlist if you want to work on a different artist’s data instead.
The meta and features data
When grabbing each track from an album, we can obtain track information such as track name, album, release date, length, and popularity.
More importantly, Spotify’s API allows us to extract a number of “audio features” such as danceability, energy, instrumentalness, liveness, loudness, speechiness, acousticness, and tempo.
Spotipy is “a lightweight Python library for the Spotify Web API”. With Spotipy, we can get full access to all of the music data provided by the Spotify platform.
I highly recommend going through Spotipy’s documentation to get an understanding of how it works and what you’re capable of doing while using this library.
Additionally, you will get a better understanding of how my code works. Check out their GitHub examples.
Writing the Code
I highly recommend following along using Repl, it is a simple to use yet powerful online IDE that works great and requires no setup. Perfect for a one-time data extraction or dataset creation project.
Let’s start by calling the necessary libraries that we need. We’ll be using
pandas to create a dataframe and save our dataset, and
time to pause the execution of the loop.
Connect to the API
Next, we need to authenticate and connect to Spotify’s API. To do so, we need our “Client ID” and “Client Secret”.
In the code above, replace the
Client ID and
Client Secret variables with your own and make sure they are inside quotes.
Retrieve IDs for each track
As I mentioned earlier, I’ll be extracting Kehlani’s albums and singles from a playlist I created which is a collection of 54 songs (~3 hours), containing every single and every song from every album she has released that is currently on Spotify, updated with her new singles.
Now we’ll write a function to get the IDs for each track of this playlist.
At the bottom of this code, where it says
ids = getPlaylistTrackIDs, the two variables separated by a comma in quotes will be the username (found in the URL) of the person who created the playlist, and then the playlist URI which you can find by hitting the setting button on the playlist where you’d find the share link.
Now, let’s check what we have so far by running the length of the
ids we grabbed to see if it matches the 54 songs we have on the playlist.
This is the result:
54['7AiMnJSODcJoKDejQ3mnoJ', '73C4vh7W8u41Vll5HvBqv7', '5kYZbBLAGrrhFKNbOs6D95', '18z6OV5lknJmKnZi7aA1zH', '1gHtbcRP4tz1O1NsxPpBea', '3kJudfRjZMItdFYVCCaSi6', '6mzaCRuLTRiz1caGOum3zT', '5h4Uqkh9RpRZwm5ADLh5uj', '3rGew9pmFEmGD9nZ12F1tN', '0dYDmow4l5hbPs5E6QLMSC', '1B3jkf6CyuiF8CQcKlUx9y', '6DkmFhzJrkVhDlcgcEy7Pc', '5dKy6Cgv6xwiRY3j3AJ7Uq', '0oz4ZqHuUaz3uEkP2vD0u8', '389hKTL3ZBPPWP3VuXfEyv', '5cw9s2zGrbny2M2p3WRmGm', '3YaMX9Cf68dxiG6RKo0pSY', '1cAL4sFzXXRMbpZnTPa7Zi', '4w5BVeKJFCj2rrrEy31s0n', '4ta2AWru6ldjg1aHzww0aK', '45DJ0PbKPdbslnyrcM80HN', '7yNu82yd6dYmGQ0H1q0jKo', '4v4HwTfMPslhWAnJxIXchn', '6XptjfnUvLfejptpjPRhCT', '1y2SK8EjL3WSnJvJEMWOoq', '4UMp46x46Zmu9OEr8m3Gl2', '7nb50hgKYhnHJLHKZ7qiKO','5y5OzukBTl0yTRMEdNmApJ', '3Hdl3BEFb1IEbL0Jq53enx', '0lsC0OkBgiLYbSsoHOzMnr', '0Pm1BZp4MpoMKkNxIXCfAu', '2droOB3xlZkhgfUM0owDTq', '2Nd2HLWrIq1DcNMiYPTQUC', '4j644tViOFAf4i0BYT12R8', '4k6hX9RKD096K1NCjjJZLc', '0aSW5EMeNnQSMJQ8QN3zIW', '0tkmYNfaEaH9HpR59ApRtE', '0kas95RruYRVqrOb07rgkh', '4BOikd4oZjOYMde9AXfrTo', '6ZRuF2n1CQxyxxAAWsKJOy', '1EGrDTfEuAiRzRdxlblpET', '32s2Dn9EVvO2f85MrpRoBV', '4jM3c9KLTO9iZPm9A7neiL', '6GCRnf1W9OKxok9fvNp3pz', '1Zm9qGPQkTAOBiVpGSnJUq', '3ucRKbRlikYHyoI17gfR0c', '6HUO25AttZZCoKAY0vUVtc', '5Qr7StTFbXhgHt9JlqJx0I', '1QzC4y8h6WFxHE4KlokhVr', '1xz905v9g71heS0BQQM9re', '5QTdOvIF2ehBMZpSIIGzIo', '7IJiDYPZy2AIJn3YVHhvD4', '23wuZgeX1oyJ43QYOTo9s7', '0tBBihoEWiWKqsO5ZlCbwS']
Create a function used to grab all track info from IDs
Now that we have a list of every track
ids we need, we will now write a function that will be used to grab the track information such as track name, album, release date, length, and popularity, along with the list of audio features that Spotify’s API has to offer from each track
Loop over tracks and apply the function
We’ll now loop over the tracks — applying the function we created— and save the dataset to a
.csv file using pandas.
My raw dataset looks like this:
There You Have It!
This is how you can extract Spotify data on any artist you like.
The easiest way I’ve found was simply creating my own playlist to loop through, which offered fewer errors and put everything all in one place, but there are many other ways you can go about this.
The next step would be to clean up this data and then analyze it to gain insights into the artist’s music over the years.
Different ways you can analyze and visualize Spotify’s data could be:
- The number of tracks over time.
- Visualize the length of tracks over time.
- Average danceablity over time.
- Top collaborators.
More to come in my next article, where I analyze and visualize Kehlani’s Spotify data. For now, I hope this was a quick and clear intro to Spotify’s Web API and Spotipy!