How to Scrape Song Lyrics: A Gentle Tutorial
TLDR: Use Genius.com API + Beautiful Soup to get song lyrics and write them to a local text file. Future post on interesting things you can do with this data.
This is going to be a shockingly easy tutorial that you can follow to collect song lyrics into a local .txt
file
The only hard thing about this tutorial will be reading the code on Medium without syntax highlighting
Prerequisites:
- Get a Genius.com API key (Free!): Follow instructions here
- I use Python3
GENIUS_API_TOKEN='YOUR-TOKEN-HERE'
Imports
# Make HTTP requests
import requests# Scrape data from an HTML document
from bs4 import BeautifulSoup# I/O
import os# Search and manipulate strings
import re
1. Get a list of Genius.com URL’s for a specified number of songs for an artist
# Get artist object from Genius API
def request_artist_info(artist_name, page):
base_url = 'https://api.genius.com'
headers = {'Authorization': 'Bearer ' + GENIUS_API_TOKEN}
search_url = base_url + '/search?per_page=10&page=' + str(page)
data = {'q': artist_name}
response = requests.get(search_url, data=data, headers=headers)
return response# Get Genius.com song url's from artist object
def request_song_url(artist_name, song_cap):
page = 1
songs = []
while True:
response = request_artist_info(artist_name, page)
json = response.json() # Collect up to song_cap song objects from artist
song_info = []
for hit in json['response']['hits']:
if artist_name.lower() in hit['result']['primary_artist']['name'].lower():
song_info.append(hit)
# Collect song URL's from song objects
for song in song_info:
if (len(songs) < song_cap):
url = song['result']['url']
songs.append(url)
if (len(songs) == song_cap):
break
else:
page += 1
print('Found {} songs by {}'.format(len(songs), artist_name))
return songs
# DEMO
request_song_url('Lana Del Rey', 2)
For example, this might return:
['https://genius.com/Lana-del-rey-young-and-beautiful-lyrics',
'https://genius.com/Lana-del-rey-love-lyrics']
2. Fetch lyrics from the URL’s
# Scrape lyrics from a Genius.com song URL
def scrape_song_lyrics(url):
page = requests.get(url)
html = BeautifulSoup(page.text, 'html.parser')
lyrics = html.find('div', class_='lyrics').get_text()
#remove identifiers like chorus, verse, etc
lyrics = re.sub(r'[\(\[].*?[\)\]]', '', lyrics)
#remove empty lines
lyrics = os.linesep.join([s for s in lyrics.splitlines() if s])
return lyrics# DEMO
print(scrape_song_lyrics('https://genius.com/Lana-del-rey-young-and-beautiful-lyrics'))
This would return:
I've seen the world, done it all
Had my cake now
Diamonds, brilliant, and Bel Air now
Hot summer nights, mid-July
When you and I were forever wild
The crazy days, city lights
The way you'd play with me like a child
Will you still love me when I'm no longer young and beautiful?
Will you still love me when I got nothing but my aching soul?
I know you will, I know you will, I know that you will
Will you still love me when I'm no longer beautiful?
I've seen the world, lit it up as my stage now
Channeling angels in the new age now
Hot summer days, rock and roll
The way you'd play for me at your show
And all the ways I got to know
...[etc.]
3. Loop through all URL’s and write lyrics to one file
def write_lyrics_to_file(artist_name, song_count):
f = open('lyrics/' + artist_name.lower() + '.txt', 'wb')
urls = request_song_url(artist_name, song_count)
for url in urls:
lyrics = scrape_song_lyrics(url)
f.write(lyrics.encode("utf8"))
f.close()
num_lines = sum(1 for line in open('lyrics/' + artist_name.lower() + '.txt', 'rb'))
print('Wrote {} lines to file from {} songs'.format(num_lines, song_count))
# DEMO
write_lyrics_to_file('Kendrick Lamar', 100)
This would write lyrics from one hundred Kendrick Lamar songs to a file ./lyrics/kendrick lamar.txt
Found 100 songs by Kendrick Lamar
Wrote 8356 lines to file from 100 songs
Here’s an example output for Young Thug: https://pastebin.com/raw/TKqrGiTg
Next Steps
TBD what novel application I’m going to do with this lightweight tool, but here are some ideas that inspired me:
- Rap song writing recurrent neural network trained on Kanye West’s entire discography
- Same thing but Taylor Swift
- Another good lyric generation article
- Sentiment Analysis
- Lyric generation trained on data from two or more artists (Kanye + TSwift, Young Thug + Lana Del Rey, Travis Scott + Beach Boys…go wild my children)
- Lyric generation + Audio generation = AI generated song?