Find your favorite artists in Spotify playlists with Python
The Spotify API is a rich source of data. Not only can you use it to integrate Spotify into your applications, but you can also extract a lot of data from it.
Today we will go through the process of analyzing data from one playlist, specifically analyze the existing artists in a playlist and how many songs they are featured in. We will write all the code in Python and the Spotipy library will serve as our gateway to the Spotify API. Then, pandas and Plotly (Express) will let us manipulate that data and plot it in a column chart.
The finalized code will be separated in two scripts: get_spotify_playlist_data and plot_spotify_playlist_data. The former will have the code for interacting with the API and saving the retrieved data in a CSV file, while the latter’s job is simply to load the data and plot the final chart.
Create your Spotify Application
First things first: to use the Spotify API you need to have an application (a connection) set up in the Spotify for Developers website.
Go to the website, and log in (you can use the API whether you have a free or a premium account). Click on the “Create a Client ID” option and fill in the information for your application. Give it a name and a description and, for the “What are you building” question, “I don’t know” is enough for this demo. It will give you access to your playlists and that’s what we’re targeting.
When you finish creating the application, the application page will look something like this
But, the only information we want from here are the Client ID and the Client Secret. These are the keys needed for authentication later in Python through Spotipy. As you can see from me blurring my Client ID and the Client Secret being hidden at first, you should keep these keys private.
Hence, now that you have access to the keys, it’s a good time to save them as environment variables in your machine. If you don’t know how to do this in your operating system, please refer to a Windows guide here, a Linux guide here and a Mac guide here. Instead of copy-pasting sensistive information into the script, it is safer to save that information in environment variables and then load those variables in the script.
Install Python libraries
The other setup we need before getting into the code is to install the necessary Python libraries: Spotipy, pandas and Plotly.
pip (the default Python package manager) makes these installations as trivial as entering the following commands in the terminal
pip install spotipy
pip install pandas
pip install plotly
And that’s it for the setup. With the application keys saved as environment variables and these three libraries installed, we are all set to start writing code.
Script to get data from Spotify
The first script, get_spotify_playlist_data, is the one responsible for interacting with the Spotify API. This is where we authenticate ourselves using the client keys, get the playlist data, filter out what we are not interested in and, finally, write the data to a CSV file.
I will show you the complete script right off the bat and, since the code is broken down into small functions, I’ll guide you through each one.
The script is made up of five functions:
authenticate
: authenticate into the Spotify API with the client keysget_pl_length
: helper function to get the number of songs in the playlist (this includes local files)get_tracks_artist_info
: retrieve information about the artists of each track of the playlistget_artist_counts
: count the frequency of each artist, using the previously retrieved datasave_artists_csv
: save the artist frequencies in a CSV file
The last block of code, that starts at if __name__ == "__main__"
, is important to specify which code is run when the script is being executed and when it is imported by another script. You see, when you import a script in Python (i.e., a .py file), all the code in the script is executed.
Thus, if you notice, before that last code block there is no executing code, there are only imports and function definitions . The functions are only called inside the last conditional code block. If this script were to be imported by another script, only the function definitions would be executed, that is, there wouldn’t be unexpected code running.
With this explanation, now we can take a closer look at those functions. In fact, looking at the conditional block shows how the script flows. First, it needs to load the client keys stored in environment variables. This authenticates the user in the Spotify API (authenticate
function), returning an authenticated instance of that connection. This instance is very important because, as you can see, it is passed afterwards to the get_tracks_artist_info
function, alongside the playlist URI.
(Oh and in case you’re wondering how to get a playlist URI, right click on it and choose the option shown below. Make sure to change the target playlist URI, pl_uri
, to one of your playlists)
In one sentence, get_tracks_artist_info
retrieves information about each song/track saved in the target playlist. In further detail, it keeps requesting songs from the API until it has gone through all the songs (each call can only return information for a maximum of 100 songs, hence the while loop and the offset
variable; think of it as if the playlist was paginated and each request returns one page of songs).
Then, for every batch of tracks, we loop through them in a list comprehension. It is exactly the same as a normal for loop, but this is more concise. Spotify returns a big dictionary with a items
entry. This is a list of the songs retrieved in the current batch and it is precisely this list we are looping through.
Each element of that list (each song) is itself a dictionary, with a single track
key (pl_item["track"]
). That key is in fact the information of a single song and looks like this:
Since what we want is the part about the artists (look at line 6 of this last gist), the syntax to access the information about the artists of each song is pl_item["track"]["artists"]
.
This information about the artists is another list (I hope that first screenshot helps you make sense of this big tongue twister of nested lists and dictionaries) because, well, a song can have more than one artist and inside the list of artists each artist is a dictionary of its own… (the next gist shows the complete information for the artists; the first list, line 1, is an example for a song with a single artist, and the second list, line 16, is for one with multiple artists).
Anyway, to summarize the get_tracks_artist_info
function, it returns a list where each element is the complete information about the artists of each song of the playlist (it returns a list where each element is a list identical to the ones shown in the above gist).
This list of nested lists of artist information is then passed on to the get_artist_counts
function and, thankfully, the work here with nested lists and dictionaries is simpler.
Because each element of the main list (artists_info
) is itself a list of artists (dictionaries), we make use of equally nested for loops. The outer loop goes through a list of artists at a time (track_artists
), and the inner loop goes through the artists of a single song (artist
).
Now, since we can finally look at a single artist of a single song in each iteration of the nested loop, we get the name of the artist (artist_name = artist["name"]
) and use it to update their frequency. The artist frequencies are stored in a new dictionary (artist_counts
), where the key is the artist name and the value is the number of songs they feature in. TL;DR, the get_artist_counts
function returns a simple dictionary that maps the artists to the number of songs they are featured in.
Finally, after all the mumbo jumbo of nested data, the last step for this script is to execute the save_artists_csv
which takes the dictionary returned by get_artist_counts
, puts it in a pandas DataFrame and writes that to a CSV file. Suposing this file doesn’t exist yet, it is created automatically (if it does, it is overwritten).
At this point, the CSV file is generated with the information desired about the artists and this first script, get_spotify_playlist_data, is finished. On to plotting the data with the second script!
Script to plot the data
The hard part of looking through nested data is behind us, now is only a matter of loading the data and plotting it.
Since we only have two tasks to go through, the script has only two functions:
pre_process_data
: load the data from the CSV and sort it by descending order of artist frequenciesplot_column_chart
: plot a column chart of the artist frequencies with Plotly Express, including the necessary formatting changes
Again, this script makes use of the if __name__ == "__main__"
conditional block to execute the code; before this block there are only imports and function definitions.
I think pre_process_data
is self-explanatory so I’ll jump to plot_column_chart
. px.bar
is the function from Plotly Express responsible for plotting the column chart. It receives the data to be plotted as a DataFrame. data.head(n=10)
means only the first ten rows of data, the ten most frequent, are plotted.
Since it receives a DataFrame, to specify which data is used for each axis, we just need to give it the name of the respective columns: the artists for the horizontal axis, the frequencies for the vertical axis and again the frequencies for the Text
. Text
represents the data labels to be shown for each column.
fig.update_traces
formats the data labels, fig.update_layout
formats general aspects of the plot such as the axes and the font and, lastly, fig.show
shows the plot. This will open a tab in your browser with the finalized interactive column bar chart (yup, in case you are unfamiliar with Plotly, all plots and charts are interactive).
Conclusions
After all these explanations, we finally obtain the result of the demo (the interactive result is not included in the article, but at least there’s a screenshot of it).
As we’ve seen throughout this demo, Spotify makes a lot of data available for every element of its platform, and going through the contents of a playlist is in itself plenty of data to dig through.
I think this example of plotting the most common artists in a playlist is a fun way of exploring a real dataset, as it goes through establishing the connection to the API (albeit through the Spotipy wrapper), processing the data and, finally, plotting it in a visualization.
You can find all the code for this demo on GitHub here.
Oh, as a last note for the curious readers, I refactored the get_artist_counts
function (that counts artist frequencies) to use map reduce instead of nested for loops.