Scraping Spotify Playlist using Python and Selenium

Felix Pratamasan
5 min readAug 18, 2023

--

In this article, I will explain my project about how to do data scraping using Selenium in Python. I will scraping data from Spotify playlist and store them into .csv file. To see all of the projects you can see through my Github: https://github.com/lixx21/spotify-scrapping. You can clone this repository to test the code. What we need to do is:

1. Install the libraries

We need to install the libraries to run the python script. If you looking at my github repo, you will see the libraries that need to be installed in requirements.txt. Then if you clone my project you can just simply run this command to install the libraries:

pip install -r requirements.txt

But if you not clone my project, you can install these two libraries:

pip install selenium
pip install pandas
pip install argparse

After you installed the libraries, You can create a python script, and import the libraries:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pandas as pd
import time
import argparse

2. Download the web driver

Selenium use web driver as a driver to load the data from the website. This driver will automatically open the web when you run the script. Here is the link where you can install the web driver for chrom driver: https://chromedriver.chromium.org/downloads. After you install your web driver for chrome, you can access it using python:

driver = webdriver.Chrome('{your_driver_dir_path}')

3. Define function to scraping the data

Because we want to scraping from Spotify URL which is a dynamic web, we have to add .implicit_wait() function because dynamic website will take more time to get the data than static website. This function will tell the web driver to wait for a several amount of time before it throws a “No Such Element Exception”:

def scraping(spotify_playlist_url, output_csv):
driver.get(spotify_playlist_url)
driver.implicitly_wait(10)

In this example I used 10 second to wait. It means that if the element is not located on the web page within that time frame, it will throw an exception. source: https://www.guru99.com/implicit-explicit-waits-selenium.html#:~:text=The%20Implicit%20Wait%20in%20Selenium%20is%20used%20to,element%20for%20that%20time%20before%20throwing%20an%20exception.

After that, we need to use infinite scroll to fetch more data in the website because, the website (Spotify) will get more data if we scroll down the website. Here is the code:

    time.sleep(3)

last_height = driver.execute_script("return document.body.scrollHeight")

while True:

driver.execute_script(f"window.scrollBy(0, document.body.scrollHeight)")
time.sleep(5)

new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break

last_height = new_height

to used this method, you also need to scroll down the website when the website from chrome driver open until the website automatically close. You can also read more about this infinite scroll on this link: https://medium.com/@sagar.pndt305/how-did-i-solve-the-issue-of-infinite-scroll-while-scraping-a-website-7ef1a3b57309.

After all data has been fetched, you finally can find the XPATH from the html elements in the website. To see the elements you can use F12 on your keyboard:

The get the elements we can use this code:

    # Now that all data is loaded, scrape it
titles = driver.find_elements('xpath','//div[@class="Type__TypeElement-sc-goli3j-0 fZDcWX t_yrXoUO3qGsJS4Y6iXX standalone-ellipsis-one-line"]')
artists = driver.find_elements('xpath','//span[@class="Type__TypeElement-sc-goli3j-0 bDHxRN rq2VQ5mb9SDAFWbBIUIn standalone-ellipsis-one-line"]')
albums = driver.find_elements('xpath','//span[@class="Type__TypeElement-sc-goli3j-0 ieTwfQ"]')
durations = driver.find_elements('xpath','//div[@class="Type__TypeElement-sc-goli3j-0 bDHxRN Btg2qHSuepFGBG6X0yEN"]')
songs_img = driver.find_elements('xpath','//img[@class="mMx2LUixlnN_Fu45JpFB rkw8BWQi3miXqtlJhKg0 Yn2Ei5QZn19gria6LjZj"]')

From that code, I will get all data for title of the songs, artist name, album name, duration of the song, and image url for the song. You can get the more explanation of this XPATH and how to use it in this link: https://www.lambdatest.com/blog/complete-guide-for-using-xpath-in-selenium-with-examples/

The final touch for this function is to iterate all data from what we have fetched using XPATH and store them in CSV file using Pandas library. Here is the code:

    all_data = []

for index in range(len(titles)):

all_data.append({
"title": titles[index].text,
"artist": artists[index].text,
"album": albums[index].text,
"duration": durations[index].text,
"album_image": songs_img[index].get_attribute('src')
})

df = pd.DataFrame(data=all_data)
df.to_csv(output_csv, index=False)

return f"data has been saved in {output_csv}"

The scraping() function has been created and finished now, we can move to the last step

4. Call the function

In this final section, we add our python script to call the function using argparse python. This will help us to run the script using comman to also call the funtion. Here is the code:

if __name__ == "__main__":

# Initialize parser
parser = argparse.ArgumentParser()

# Adding optional argument
parser.add_argument("-u", "--url", help = "file path of your dataset")
parser.add_argument("-o", "--output", help = "output filepath")

# Read arguments from command line
args = parser.parse_args()

scraping(args.url, args.output)

Here is the full code of this project:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pandas as pd
import time
import argparse

driver = webdriver.Chrome('C:/Users/ASUS/Downloads/chromedriver_win32r') #should use r in the behind of the path

def scraping(spotify_playlist_url, output_csv):
driver.get(spotify_playlist_url)
driver.implicitly_wait(10) # because this is a dynamic website we need to used this implicitly_wait() function
time.sleep(3)

last_height = driver.execute_script("return document.body.scrollHeight")

while True:

driver.execute_script(f"window.scrollBy(0, document.body.scrollHeight)")
time.sleep(5)

new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break

last_height = new_height

# Now that all data is loaded, scrape it
titles = driver.find_elements('xpath','//div[@class="Type__TypeElement-sc-goli3j-0 fZDcWX t_yrXoUO3qGsJS4Y6iXX standalone-ellipsis-one-line"]')
artists = driver.find_elements('xpath','//span[@class="Type__TypeElement-sc-goli3j-0 bDHxRN rq2VQ5mb9SDAFWbBIUIn standalone-ellipsis-one-line"]')
albums = driver.find_elements('xpath','//span[@class="Type__TypeElement-sc-goli3j-0 ieTwfQ"]')
durations = driver.find_elements('xpath','//div[@class="Type__TypeElement-sc-goli3j-0 bDHxRN Btg2qHSuepFGBG6X0yEN"]')
songs_img = driver.find_elements('xpath','//img[@class="mMx2LUixlnN_Fu45JpFB rkw8BWQi3miXqtlJhKg0 Yn2Ei5QZn19gria6LjZj"]')

all_data = []

for index in range(len(titles)):

all_data.append({
"title": titles[index].text,
"artist": artists[index].text,
"album": albums[index].text,
"duration": durations[index].text,
"album_image": songs_img[index].get_attribute('src')
})

df = pd.DataFrame(data=all_data)
df.to_csv(output_csv, index=False)

return f"data has been saved in {output_csv}"

# getData("https://open.spotify.com/playlist/0L39AS9Z3C8A8Pt0YZvT9p", "playlist.csv")

if __name__ == "__main__":

# Initialize parser
parser = argparse.ArgumentParser()

# Adding optional argument
parser.add_argument("-u", "--url", help = "file path of your dataset")
parser.add_argument("-o", "--output", help = "output filepath")

# Read arguments from command line
args = parser.parse_args()

scraping(args.url, args.output)

Then we can save and run this python using this command:

python scraping.py -u {spotify_playlist_url} -o {csv_file.csv}

This is the example:

python scraping.py -u https://open.spotify.com/playlist/0L39AS9Z3C8A8Pt0YZvT9p -o mix.csv

THEN FINALLY 🎉🎉: all the steps to scraping Spotify playlist data is success !!! 👍👨🏻‍💻👏🏻.

If you have any question and want to keep contact with me you can reach me on LinkedIn: https://www.linkedin.com/in/felix-pratamasan/

--

--

Felix Pratamasan

Hi everyone 🙋🏻‍♂️, I am a software engineer who hungry to learn👨🏻‍💻 and I am from Indonesia. Hope my article could help you learn about tech 🚀.