Scrape Data With Python Using Spotify API and Visualize With PowerBI

5 min readJul 9, 2024

Insights & Good Spotify: Exploring Dunsin Oyekan’s Discography

In May, I took up my thinking hat.

Why?
The question I had was simple: would it not be nice to analyze the discography of some of your favourite artists on Spotify?

Yes it is, Zion Oluwasegun paved the way with his extensive work on Nathaniel Bassey’s Spotify releases.

The goal of this project was to use visuals to answer a series of questions any data related to the Nigerian music industry. From identifying chart toppers to understanding artist preferences, I decided to dive in myself into The releases of Dunsin Oyekan or as he is fondly called The Eagle.

Beware of some dangerous music puns 😁

Striking The First Chord
Setting the stage

!pip install spotipy

# Importing requisite liraries

import spotipy
import pandas as pd
import numpy as np
from timeit import default_timer as timer
from datetime import timedelta
from pandas.api.types import CategoricalDtype
import sqlite3

Striking The First Chord

Using the Spotipy library by Spotify, all the goals were achievable. However, this required access to a Spotify developer account, which necessitated creating an account on the platform. You could set up a developer account by visiting the developer website, navigating to the Dashboard, then to My Spotify Analytics, Settings, and Basic Information to obtain the necessary IDs. Or just visit this link

After creating, you need to take note of the unique tokens and Ids assigned to your app. It is with these details you would be interacting with the API

Knowing Our Keys

There are over twenty important features we would be gathering; artist_name, track_name, album_name, release_date, popularity amongst others

Starting the scrape rhythm

# Setting up with your ID and token

from spotipy.oauth2 import SpotifyClientCredentials

client_id= 'ENTER YOUR ID'
client_secret = 'ENTER YOUR SECRET TOKEN'
client_credentials_manager = SpotifyClientCredentials(
                                client_id = client_id, client_secret = client_secret)

sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Don’t you feel the process should be monitored 🤔? Well, yes !

# Keeping track of runtime

def format_time(seconds):
    minutes, seconds = divmod(seconds, 60)
    if minutes > 0:
        return f"{minutes} minutes, {seconds} seconds"
    else:
        return f"{seconds} seconds"

Where would the features we are getting be stored? Certainly not homeless😩

# Lists to hold scraped features for each track 
artist_name = []
track_name = []
track_id = []
album_name = []
album_id = []
release_date = []
duration_ms = []
popularity = []
explicit = []
danceability = []
energy = []
key = []
loudness = []
mode = []
speechiness = []
acousticness = []
instrumentalness = []
liveness = []
valence = []
tempo = []
time_signature = []
featured_artists = []

# Some real action
start_time = timer()

artist_name = 'dunsin oyekan'

for i in range(0, 1000, 50):
    try:
        track_results = sp.search(q=f'artist:{artist_name}', type='track', limit=50, offset=i)
        for i, t in enumerate(track_results['tracks']['items']):

            # get track details
            artist_name.append(t['artists'][0]['name'])
            track_name.append(t['name'])
            track_id.append(t['id'])
            album_name.append(t['album']['name'])
            album_id.append(t['album']['id'])
            release_date.append(t['album']['release_date'])
            popularity.append(t['popularity'])
            explicit.append(t['explicit'])

            # get audio features for track
            audio_features = sp.audio_features(t['id'])[0]
            danceability.append(audio_features['danceability'])
            duration_ms.append(audio_features['duration_ms'])
            energy.append(audio_features['energy'])
            key.append(audio_features['key'])
            loudness.append(audio_features['loudness'])
            mode.append(audio_features['mode'])
            speechiness.append(audio_features['speechiness'])
            acousticness.append(audio_features['acousticness'])
            instrumentalness.append(audio_features['instrumentalness'])
            liveness.append(audio_features['liveness'])
            valence.append(audio_features['valence'])
            tempo.append(audio_features['tempo'])
            time_signature.append(audio_features['time_signature'])


            # get featured artists
            if len(t['artists']) > 1:
                feat_artists = []
                for j in range(1, len(t['artists'])):
                    feat_artists.append(t['artists'][j]['name'])
                featured_artists.append(feat_artists)
            else:
                featured_artists.append([])
    except ReadTimeout as e:
        print(f"Error: {e}. Retrying in 5 seconds...")
        time.sleep(5)  # Retry after a short delay



# create dataframe from lists
df = pd.DataFrame({
    'artist_name': artist_name,
    'track_name': track_name,
    'track_id': track_id,
    'album_name': album_name,
    'album_id': album_id,
    'release_date': release_date,
    'duration_ms': duration_ms,
    'popularity': popularity,
    'explicit': explicit,
    'danceability': danceability,
    'energy': energy,
    'key': key,
    'loudness': loudness,
    'mode': mode,
    'speechiness': speechiness,
    'acousticness': acousticness,
    'instrumentalness': instrumentalness,
    'liveness': liveness,
    'valence': valence,
    'tempo': tempo,
    'time_signature': time_signature,
    'featured_artists': featured_artists
})



# Return elapsed time for procedure
end_time = timer()
elapsed_time = int(end_time - start_time)
print(f"Elapsed time: {format_time(elapsed_time)}")

# Saving our work in Comma-Seperated Value format
df.to_csv(f'{artist_name} Spotify Tracks.csv', index=False)
print("Done")

That’s all great, but what if I want to scrape the data directly into Power BI.

Have no fear, click through this link to learn how to get data from the web into your PowerBI

Through this project, I gathered comprehensive data on Dunsin Oyekan’s discography, including the diversity of his music in terms of audio features and the variety of artists he has collaborated with. This data can provide valuable insights into his musical style, popularity trends, and collaborative patterns.

Check out the dashboard below 😀