Scrape Data With Python Using Spotify API and Visualize With PowerBI
Insights & Good Spotify: Exploring Dunsin Oyekan’s Discography
In May, I took up my thinking hat.
Why?
The question I had was simple: would it not be nice to analyze the discography of some of your favourite artists on Spotify?
Yes it is, Zion Oluwasegun paved the way with his extensive work on Nathaniel Bassey’s Spotify releases.
The goal of this project was to use visuals to answer a series of questions any data related to the Nigerian music industry. From identifying chart toppers to understanding artist preferences, I decided to dive in myself into The releases of Dunsin Oyekan or as he is fondly called The Eagle.
Beware of some dangerous music puns 😁
Striking The First Chord
Setting the stage
!pip install spotipy
# Importing requisite liraries
import spotipy
import pandas as pd
import numpy as np
from timeit import default_timer as timer
from datetime import timedelta
from pandas.api.types import CategoricalDtype
import sqlite3
Striking The First Chord
Using the Spotipy library by Spotify, all the goals were achievable. However, this required access to a Spotify developer account, which necessitated creating an account on the platform. You could set up a developer account by visiting the developer website, navigating to the Dashboard, then to My Spotify Analytics, Settings, and Basic Information to obtain the necessary IDs. Or just visit this link
After creating, you need to take note of the unique tokens and Ids assigned to your app. It is with these details you would be interacting with the API
Knowing Our Keys
There are over twenty important features we would be gathering; artist_name, track_name, album_name, release_date, popularity amongst others
Starting the scrape rhythm
# Setting up with your ID and token
from spotipy.oauth2 import SpotifyClientCredentials
client_id= 'ENTER YOUR ID'
client_secret = 'ENTER YOUR SECRET TOKEN'
client_credentials_manager = SpotifyClientCredentials(
client_id = client_id, client_secret = client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
Don’t you feel the process should be monitored 🤔? Well, yes !
# Keeping track of runtime
def format_time(seconds):
minutes, seconds = divmod(seconds, 60)
if minutes > 0:
return f"{minutes} minutes, {seconds} seconds"
else:
return f"{seconds} seconds"
Where would the features we are getting be stored? Certainly not homeless😩
# Lists to hold scraped features for each track
artist_name = []
track_name = []
track_id = []
album_name = []
album_id = []
release_date = []
duration_ms = []
popularity = []
explicit = []
danceability = []
energy = []
key = []
loudness = []
mode = []
speechiness = []
acousticness = []
instrumentalness = []
liveness = []
valence = []
tempo = []
time_signature = []
featured_artists = []
# Some real action
start_time = timer()
artist_name = 'dunsin oyekan'
for i in range(0, 1000, 50):
try:
track_results = sp.search(q=f'artist:{artist_name}', type='track', limit=50, offset=i)
for i, t in enumerate(track_results['tracks']['items']):
# get track details
artist_name.append(t['artists'][0]['name'])
track_name.append(t['name'])
track_id.append(t['id'])
album_name.append(t['album']['name'])
album_id.append(t['album']['id'])
release_date.append(t['album']['release_date'])
popularity.append(t['popularity'])
explicit.append(t['explicit'])
# get audio features for track
audio_features = sp.audio_features(t['id'])[0]
danceability.append(audio_features['danceability'])
duration_ms.append(audio_features['duration_ms'])
energy.append(audio_features['energy'])
key.append(audio_features['key'])
loudness.append(audio_features['loudness'])
mode.append(audio_features['mode'])
speechiness.append(audio_features['speechiness'])
acousticness.append(audio_features['acousticness'])
instrumentalness.append(audio_features['instrumentalness'])
liveness.append(audio_features['liveness'])
valence.append(audio_features['valence'])
tempo.append(audio_features['tempo'])
time_signature.append(audio_features['time_signature'])
# get featured artists
if len(t['artists']) > 1:
feat_artists = []
for j in range(1, len(t['artists'])):
feat_artists.append(t['artists'][j]['name'])
featured_artists.append(feat_artists)
else:
featured_artists.append([])
except ReadTimeout as e:
print(f"Error: {e}. Retrying in 5 seconds...")
time.sleep(5) # Retry after a short delay
# create dataframe from lists
df = pd.DataFrame({
'artist_name': artist_name,
'track_name': track_name,
'track_id': track_id,
'album_name': album_name,
'album_id': album_id,
'release_date': release_date,
'duration_ms': duration_ms,
'popularity': popularity,
'explicit': explicit,
'danceability': danceability,
'energy': energy,
'key': key,
'loudness': loudness,
'mode': mode,
'speechiness': speechiness,
'acousticness': acousticness,
'instrumentalness': instrumentalness,
'liveness': liveness,
'valence': valence,
'tempo': tempo,
'time_signature': time_signature,
'featured_artists': featured_artists
})
# Return elapsed time for procedure
end_time = timer()
elapsed_time = int(end_time - start_time)
print(f"Elapsed time: {format_time(elapsed_time)}")
# Saving our work in Comma-Seperated Value format
df.to_csv(f'{artist_name} Spotify Tracks.csv', index=False)
print("Done")
That’s all great, but what if I want to scrape the data directly into Power BI.
Have no fear, click through this link to learn how to get data from the web into your PowerBI
Through this project, I gathered comprehensive data on Dunsin Oyekan’s discography, including the diversity of his music in terms of audio features and the variety of artists he has collaborated with. This data can provide valuable insights into his musical style, popularity trends, and collaborative patterns.
Check out the dashboard below 😀
Some of the KPIs
Assessing Average Popularity of Albums
Assessing Average Popularity of Tracks & Releases
Assessing Track Releases & Popularity
To learn about scraping data into PowerBI, click here