Spotify User Profile Analysis With Spotifyr — RStudio

Kemal Gunay
3 min readNov 20, 2021

Spotifyr is an R wrapper for pulling track audio features and other information from Spotify’s Web API in bulk. By automatically batching API requests, it allows you to enter an artist’s name and retrieve their entire discography in seconds, along with Spotify’s audio features and track/album popularity metrics. You can also pull song and playlist information for a given Spotify User (including yourself!).

First, set up a Dev account with Spotify to access their Web API here. This will give you your Client ID and Client Secret. Once you have those, you can pull your access token into R with get_spotify_access_token().

The easiest way to authenticate is to set your credentials to the System Environment variables SPOTIFY_CLIENT_ID and SPOTIFY_CLIENT_SECRET. The default arguments to get_spotify_access_token() (and all other functions in this package) will refer to those. Alternatively, you can set them manually and make sure to explicitly refer to your access token in each subsequent function call.

Authorization Code Flow

For certain functions and applications, you’ll need to log in as a Spotify user. To do this, your Spotify Developer application needs to have a callback url. You can set this to whatever you want that will work with your application, but a good default option is http://localhost:1410/ (see image below). For more information on authorization, visit the offical Spotify Developer Guide.

library(spotifyr)
library(dplyr)
library(purrr)
library(knitr)

Sys.setenv(SPOTIFY_CLIENT_ID = ‘xxxxxxxxxxxxxxxxxxxxxxx’)
Sys.setenv(SPOTIFY_CLIENT_SECRET = ‘xxxxxxxxxxxxxxxxxx’)

access_token <- get_spotify_access_token()

david_bowie <- get_artist_audio_features(‘david bowie’)

# Artist Keys, We see David Bowie’s most of songs major
david_bowie %>%
count(key_mode, sort = TRUE) %>%
head(5) %>%
kable()

Recently what I listened
library(lubridate)

get_my_recently_played(limit = 10) %>%
mutate( artist.name = map_chr(track.artists, function(x) x$name[1]),
played_at = as_datetime(played_at)) %>%
select( all_of(c(“track.name”, “artist.name”, “track.album.name”, “played_at”))) %>%
kable()

My Top Artists and Their Genres
get_my_top_artists_or_tracks(type = ‘artists’,
time_range = ‘long_term’, limit = 10) %>%
select(.data$name, .data$genres) %>%
rowwise %>%
mutate(genres = paste(.data$genres, collapse = ‘, ‘)) %>%
ungroup %>%
kable()

My Top Artists and Their Tracks (long term)

# You can look at short term with short_term
get_my_top_artists_or_tracks(type = ‘tracks’, time_range = ‘long_term’, limit = 10) %>%
mutate( artist.name = map_chr(artists, function(x) x$name[1])) %>%
select(name, artist.name, album.name) %>%
able()

Nick Drake’s Joy Analysis

joy <- get_artist_audio_features(‘nick drake’)

ggplot(joy, aes(x = valence, y = album_name)) +
geom_density_ridges() +
theme_ridges() +
labs(title = “Joyplot of Nick Drake’s joy distributions”,
subtitle = “Based on valence pulled from Spotify’s Web API with spotifyr”)

Coldplay’s Album Analysis

library(ggplot2)
library(ggridges)

coldplay <- get_artist_audio_features(“Coldplay”)
View(coldplay) # 39 variables for feature engineering

Coldplay Joy Plot

coldplay %>%
group_by(album_name) %>%
filter(!album_name %in% c(“Viva La Vida (Prospekt’s March Edition)”)) %>%
ggplot(aes(x = valence, y = album_name, fill = ..x..)) +
geom_density_ridges_gradient() +
labs(title = “Joyplot of Coldplay’s joy distributions”,
subtitle = “Based on valence pulled from Spotify’s Web API with spotifyr”)

References

--

--

Kemal Gunay

PostDoc Data Scientist at University of Trento — NLP Enthusiastic & Communication Sciences https://gunaykemal.com