APIs Zero to Hero: ft. Spotify
If you’ve read about prominent software companies, you’ve probably heard of APIs. For many rising stars, this is the product that they actually sell (see Plaid, a fascinating fintech firm). So what is an API, and why do people throw around the idea that we’re in an API economy?
An API (application programming interface) is simply software that allows 2 applications to talk to each other.
It’s a middleman that connects two entities, generally one that you would think of as an app and another that you would think of as a server, a machine that provides some functionality to the entities that request it. This all feels abstract, so allow me to provide a descriptive analogy for an API and then relate it back to the real technology…
I love the analogy of you going to a restaurant with chefs that like to keep to themselves and have a really rowdy kitchen!
You enter the restaurant, but you aren’t allowed to go into the kitchen yourself to order because the chefs care about their privacy and you just couldn’t handle how overwhelming it is.
- You order through a trained waiter who verifies who you are with your reservation.
- The waiter takes your input (your order) to the kitchen, where the waiter is also trusted.
- The kitchen reads and interprets your order, preparing your meal appropriately.
- Then, again because the chefs are shy and them leaving the kitchen would make the restaurant hectic, the waiter is the intermediary who has the clearance and ability to bring just what you need and only what you need (your specific order) back to you.
- Finally, you receive your food and probably will add some sauce or modify the meal in some way into a more favorable form.
This scenario illustrated how an API works, and now we can discuss how your everyday apps use them.
Companies don’t want to give you full access to their servers for security reasons, and it’s so much data that your device probably couldn’t handle it anyways. The kitchen here represents the internet server that provides some utility, usually in the form of pulling data, for the client that requested it like a weather app on your phone. What bridges these two is the API, represented by the waiter, that both sides trust.
- Some app on your phone interprets taps on the screen as the input you provide and sends this input to an API. Just like the waiter verified your reservation, your keys (usually client key and secret key) verify if you should have access to request the server or not.
- The API relays these instructions to an internet server (the Weather Channel).
- These instructions cause the server to perform some task (probably pulling data, like temperature forecasts for the coming week).
- The data searched on the server is sent back to your device's application by the API, without giving you direct access to it.
- Your application reads the information and displays it in a digestible way (what you have set your app preferences to or the specific city that you have pulled up, for example). No matter how you tweak the app settings on your end and even if you were to manipulate the data, the server data remains unchanged and correct.
From Apple’s weather app mentioned above accessing the Weather Channel, to websites showing their location through Google Maps, to a travel agent site like Expedia updating trip fares from the airline companies’ individual sites, APIs are everywhere. In these use cases, the benefits are obvious.
As an external developer, why reinvent the wheel, writing and maintaining code that already exists, when you can call on data and functionality from other applications? Also, why would you opt for public data that can be manipulated by other clients when you can get secure, reliable data that only the source can control, directly from the source?
From a company perspective, why would you not encourage (and potentially monetize) the use of functionalities you already offer?
Now that we know what an API is, let’s see how we use one with Spotify’s API!
For any beginner with APIs like myself, I found Spotify’s much easier to use than others (such as Twitter’s). The steps to make a developer account are very easy, and the access that you are granted after that is sufficient to do virtually anything you’d need.
To implement in Python, you must first install the spotipy package, and then import these useful libraries:
#import libraries for spotify API, and pandas for data manipulationimport spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
Next, to receive access to Spotify’s developer functionalities through its API, we must authenticate our identity. After registering your developer account and creating an app in your dashboard, you can find your client key and secret key. These can be thought of as allowing you to “sign in.”
#Create a client credentials object to store your keys
#NOTICE -- I did not include my actual keys here and you should keep yours confidential as wellclient_credentials_manager = SpotifyClientCredentials(client_id = '<your client id>', client_secret = '<your secret id>')
#now that you have an object storing your keys, we can instantiate a spotipy object that has access to the methods we care aboutsp = spotipy.Spotify(client_credentials_manager= client_credentials_manager)
Now that we have access to API functionalities, we can provide the information of the playlist that we want to analyze to send a pull request to the API for the data.
#many methods rely on a uniform resource indicator (URI), a part of the URL link that serves as a unique id for any playlist, song, album, artist, etc.my_playlist_link = 'https://open.spotify.com/playlist/1Vl23QvujvM5kROoqjF1fC?si=1c68f80304cf4bc0'
#extract just the URI, the string between the last '/' and the '?'my_playlist_URI = my_playlist_link.split("/")[-1].split("?")[0]
With this playlist URI, we can use the .playlist_tracks method of our spotipy object to pull the album, song, and artist data for every song on my playlist.
tracks_in_my_playlist_info = sp.playlist_tracks(my_playlist_URI)
APIs will generally return your data in json or dictionary formats.
But how is this data specifically stored? And how do we work with the tracks_in_my_playlist_info object?
I was lost when I was first learning, so I explored by printing a mass of data, but let’s do it more systematically.
type(tracks_in_my_playlist_info)
We now know that this is a dictionary object. To see the next layer of the data, we can investigate the keys:
tracks_in_my_playlist_info.keys()
One of these keys is ‘items,’ which is where we will find all of our info. But what values correspond to the ‘items’ key?
type(tracks_in_my_playlist_info["items"])
If we index the dictionary by the ‘items’ key, the result is a list type object.
Our first challenge arises here because we cannot index a list by a key label like we can a dictionary. Let’s dig a bit more and see what the first entry of the ‘items’ list is with this output:
tracks_in_my_playlist_info["items"][0]
This is a large output, but we can see that each entry in ‘items’ is a dictionary object covering one song in the playlist (song, artist, and album data). The final values that we care about (such as the name of the track, the artist, etc.) can be found further in nested dictionaries. This means that we are good to go back to label indexing. After selecting the appropriate integer list index to get each song’s main dictionary entry, we can label index to get to the track data and again to get specific fields like the track name.
#confirming that we are integer indexing through each dictionary entry of the 'items' listprint(type(tracks_in_my_playlist_info["items"][0]))#label indexing to get to the nested track dictionaryprint(type(tracks_in_my_playlist_info["items"][0]["track"]))#and finally label indexing to get the name of the first songprint(type(tracks_in_my_playlist_info["items"][0]["track"]["name"]))
If we know the fields we want (for this project: track URI, track name, main artist name, main artist genre, and main artist popularity), we simply need to iterate through the integer indices of the ‘items’ list.
#create lists to store these fields for each of the playlist songssong_uri = []
song_name = []
artist = []
artist_main_genre = []
song_popularity = []#loop through the 'items' list, with the loop index 'entry' serving as an integer index so we can get to the deeper nested dictionariesfor entry in tracks_in_my_playlist_info["items"]:
#go further into the track dictionary and extract the URI, splitting the link string to get just the identifier partsong_uri.append(entry["track"]["uri"].split(":")[-1])
#go further into the track dictionary to append the song namesong_name.append(entry["track"]["name"])
#go into the track dictionary and select the list of artists
#from the list of artists (potentially more than one), select only the first one and append their nameartist.append(entry["track"]["artists"][0]["name"])#Within this same artists list, we can pull the artist's unique URI
#From the artist's URI, we can use the .artist method to get a list of the artist's genres-- a proxy for the song genres
#Again, select only the first in the list and append this main genretry:artist_main_genre.append(sp.artist(entry["track"]["artists"][0]["uri"])["genres"][0])#However, some artist profiles do not have genres, so we must add an exception rule if the 'genres' list returns emptyexcept IndexError:artist_main_genre.append("unknown")
#append the song's popularitysong_popularity.append(entry["track"]["popularity"])
Finally, we can generate a pandas DataFrame of these 5 basic fields for easier manipulation:
#create a dataframe with these lists as values and their names as column namesbasic_song_data = pd.DataFrame({'song_uri': song_uri,'song_name': song_name,'artist': artist,'genre' : artist_main_genre,'popularity': song_popularity})
#output the first 5 rows to make sure it's what we wantbasic_song_data.head()
At least at a glance, we seem to have the name, artist, and a believable genre for the first 5 songs in my playlist, so everything looks correct. What we don’t yet have is musical metrics on the songs — this is what I’d say the Spotify API is known for. Thankfully, this is the easy part:
#Feed our song URIs to the audio_features method to pull various musical metricsdetailed_song_features = sp.audio_features(basic_song_data['song_uri'])#if you were to print detailed_song_features, you'd receive a list of dictionary entries. Each dictionary represents one song, with metric names as keys and metric values as values. Given that, we can convert the keys that each dictionary shares into columns and the sets of values as rows of a pandas DataFrame using .from_dict()detailed_song_features = pd.DataFrame.from_dict(detailed_song_features)#output the first 5 rows to verify these stepsdetailed_song_features.head()
Note the ‘id’ column here. This column contains the URI of each song, so a merge between the basic_song_data DataFrame and detailed_song_features DataFrame is simple.
#merge the basic and detailed song dataframes on their columns that contain URIs of the songsall_song_data_for_my_playlist = pd.merge(left = basic_song_data, right = detailed_song_features, left_on = "song_uri", right_on= "id")#We probably do not need all of these columns. In the next step of this project, I plan to use danceability and valence, and duration may be interesting to visualize, so we can select those out of the musical metrics.simple_song_data_for_my_playlist = all_song_data_for_my_playlist[["song_name", "artist", "genre", "popularity", "danceability", "valence", "duration_ms"]]
#output the first 5 rows to verify these stepssimple_song_data_for_my_playlist.head()
End Notes
I want to thank Cameron Watts for this incredibly helpful article that helped me get started with this project.
Click here for the full Jupyter notebook code without exploratory output statements.
Looks like we have all we need, and we’re ready to move on to analysis of my playlist data! We have learned to authenticate with Spotify’s API, using it to pull and manipulate data with methods that apply to playlists, artists, and songs. My next article will begin from this point, but until then, thank you for reading!