Using Spotify API for Data Collection

5 min readDec 12, 2022

This article aims to explain how to use Spotify's web API for music data collection. More specifically, we want to collect the audio features concerning different tracks belonging to different music genres to see whether this information can be used for music genre identification. I got a lot of inspiration from this amazing article written by Taha Ashtiani on the same subject. So go check this out too. Here, we are going to discuss setting up access to Spotify API and accessing the music genre-based data.

1. Obtaining API Keys

Spotify provides free access to a lot of music-related data through their web API such as artists, albums, playlists, tracks, audio features etc. Follow the steps given below for setting up API access.

Go here and create a Spotify for developer account.
Go to Dashboard and Create an APP by providing the required details.
Now you can access Client ID and Client Secret (Click “Show Client Secret”). Save these two values as they are required for the authentication process. Do not use this information in your plain code. Instead, save these values in a file.

Note: If you are using Git make sure to add this file to .gitignore.

Client ID : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Client Secret : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2. Authenticating with Spotify

There are two types of authentication.

With a specific user— we can obtain user-specific information such as the user's top tracks/ artists, recently played tracks etc.
Without user — we can obtain generic information from Spotify's catalogue such as tracks, playlists etc.

As we only require music track-related information we are using the second method for authentication.

3. Spotipy

Spotipy is the Python wrapper to access Spotify's web API. First, install spotipy and import the required modules.

#!pip3 install spotipy

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

Connect to the API using the below snippet of code.

def authorization_spotify():
    #Read API key
    with open('client_info.txt') as f:
        lines = f.readlines()

    #Authentication - without user
    cid= lines[0].split(":")[1].strip()
    secret =lines[1].split(":")[1].strip()

    #Create a Spotify object sp to access the API
    client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
    sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager, status_retries=5)

    return sp

Now we can use the sp object to access the required information. You will encounter different parameters when sending requests to, or receiving responses from the web API. (read more here)

One such parameter is Spotify URI. This is the resource identifier that can be used to locate a given resource such as a track, an artist, a playlist etc. in Spotify's catalogue. Below is an example of a URI.

spotify:track:6rqhFgbbKwnb9MLmUQDhG6

The second part of the above URI indicates the resource type and the last part is the Spotify ID. The Web API's responses are normally formatted as a JSON object.

Now that we have all the prerequisites satisfied let's obtain some information with respect to a given track. Here, we have used a random track. Refer to this to find out more about how to obtain Spotify URIs.

#Obtain audio features of a given track
sp.audio_features('2DRMuw0U0QbkVQxWxdJV3M')

The audio features concerning the above track id are as below.

[{'danceability': 0.59,
  'energy': 0.833,
  'key': 5,
  'loudness': -6.503,
  'mode': 0,
  'speechiness': 0.0615,
  'acousticness': 0.0142,
  'instrumentalness': 5.4e-06,
  'liveness': 0.105,
  'valence': 0.538,
  'tempo': 94.963,
  'type': 'audio_features',
  'id': '2DRMuw0U0QbkVQxWxdJV3M',
  'uri': 'spotify:track:2DRMuw0U0QbkVQxWxdJV3M',
  'track_href': 'https://api.spotify.com/v1/tracks/2DRMuw0U0QbkVQxWxdJV3M',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/2DRMuw0U0QbkVQxWxdJV3M',
  'duration_ms': 240439,
  'time_signature': 4}]

4. Data collection

In this part of the article, we are going to discuss the data collection approach we have adopted. As mentioned in the beginning we aim to collect music genre-wise audio features from respective tracks. Hence, we need to have a way to access music genres → tracks → audio features. However, Spotify's web API does not provide a direct way for us to access a particular genre and the tracks that belong to it. Hence, we took the approach of searching for playlists under a given genre name and then for each playlist obtaining their track URIs. Once we have the genre-wise track URIs we pass them to the API for obtaining their audio features. Hence, we access the information as genre → playlist → tracks → audio features.

Genre identification — First we gathered information about the major music genres and their sub-genres. For obtaining the aforementioned information we used the https://everynoise.com website. This website maps all available music genres and their sub-genres. There, we can search for each major music genre and then obtain the most popular sub-genres for each. We carried this out manually and created the below genre-sub-genre list.

genre_dict = {"pop": ["pop", "post-teen pop", "uk pop", "dance pop", "pop dance"], 
 "rock": ["rock", "album rock", "permanent wave", "classic rock", "hard rock", "modern rock", "alternative rock", "heartland rock"], 
 "hip hop": ["hip hop", "rap", "gangster rap", "hardcore hip hop", "east coast hip hop", "alternative hip hop", "southern hip hop", "trap"], 
 "r&b": ["r&b", "urban contemporary", "contemporary r&b", "neo soul", "quiet storm", "alternative r&b", "indie r&b"], 
 "edm ": ["edm", "electronica", "downtempo", "alternative dance", "indietronica", "electropop", "deep house"], 
 "country": ["country", "contemporary country", "texas country"], 
 "classical": ["classical", "compositional ambient", "orchestral soundtrack", "soundtrack"], 
 "metal": ["metal", "speed metal", "old school thrash", "power metal", "glam metal", "alternative metal", "nu metal", "screamo", "metalcore"], 
 "jazz": ["jazz", "early jazz", "modern jazz", "early jazz", "vocal jazz", "cool jazz"], 
 "blues": ["blues", "traditional blues", "acoustic blues", "texas blues", "chicago blues", "memphis blues", "modern blues", "country blues"]}

We collected 10 major music genres. While doing that we had to make sure that sub-genres are not overlapping between different major genres.

2. Request data from the API — The below code snippet is used for requesting data from the Spotify API. However, this is a simplified version as it is easy to understand at a glance. Refer to this for full code with error handling and database calls.

#For each major genre in genre_dict
for genre in tqdm(genre_dict.keys()):
        
  #Get the set of sub-genres
  for subgenre in genre_dict[g]:

    play_lists = []

    print("Genre : ",genre, "Sub-genre : ", subgenre)

    #Get the first 3 playlist ids for each sub-genre 
    #using the sp object created at authentication.  
    play_lists = sp.search(subgenre, type='playlist', limit=3)
    play_lists = pd.DataFrame(play_lists['playlists']['items'])
    play_lists = play_lists['id'] 
   

    #If no playlists returened
    if len(play_lists) < 1:
      print("No playlists found for sub-genre!")
      break;

     play_list_tracks=pd.DataFrame()

     #For each playlist get track info
     for play_list in play_lists:
       
       #Count numbrt of tracks in a playlist
       n_items = len(sp.playlist_items(play_list)['items'])

       #For each track in playlist get audio features and popularity
       for n in range(1, n_items):
         t_data=sp.playlist_items(play_list)['items'][n]['track']
         
         #track id
         track_id = t_data['id']

         #popularity
         track_popularity = t_data['popularity']

         #audio features
         audio_df = pd.DataFrame(sp.audio_features(track_id))
   
         #update dataframe with genre info           
         audio_df['popularity'] = track_popularity
         audio_df['genre'] = g
         audio_df['sub-genre'] = sg

         #Append track data to a new dataframe created for each sub-genre
         play_list_tracks = pd.concat([play_list_tracks, audio_df])

         #Wait before next API call for not to exceed allowed request rate       
         time.sleep(np.random.uniform(1, 3))

      #write the created dataframe the db or to a csv. 
     
      #To Postgress DB. Refer the github repo for more information
      #on write_to_db function.   
      write_to_db(play_list_tracks)

      #To CSV
      fname=str(genre)+_+str(subgenre)+".csv"
      play_list_tracks.to_csv(fname)

And finally, we have the data written to a Pandas data frame.

Audio features for different music genres.

In the next article let's draw some insights from this data followed by building a classifier for genre classification. Till then check the github repo.

Using Spotify API for Data Collection

1. Obtaining API Keys

2. Authenticating with Spotify

3. Spotipy

4. Data collection

Written by Navoda Senavirathne