The DAP Journey: Offerings to Apollo

For the love of music and data analytics

Published in

SMUBIA

8 min readMay 14, 2019

In this Medium series, BIA extracts the introspection of our Data Associates as they recall their academic exploration. This post features an analytics project on music, directed by Ding Yang, Michelle, Bryan and supervised by Gabriel Sidik.

Introduction

Hi hi! We’re team Apollysis and we are Data Associates of the Data Associate Program (DAP) at SMU BIA that was ongoing for 4 months from January to April 2019.

Bryan

Aspirations:
“I’m interested in improving consulting services with data analytics and hopefully start my own business.”
So why are you interested in data analytics:
“Personally, I feel data is all around us and if we’re able to find meaning and value in it then we’re able to achieve so much more.”

Ding Yang

Aspirations:
“There is an increasing trend in the use of machine learning in the cybersecurity. I would like to join contribute to this movement to enhance the cybersecurity practices currently.”
So why are you interested in data analytics:
“I have always wanted to grasp machine learning concepts, but I never really dived into or prioritised it. With my love for cybersecurity, I would love to learn more machine learning methods to help achieve my aspirations.”

Michelle

Aspirations:
“The big dream would be to travel the world to experience new cultures and meet new people, but life is not all ‘dream’. And in between these dreams, I want to do something that I enjoy. Ever since I joined Information Systems, I’ve always been intrigued by everything that I’m learning and recently my interest in data analytics was piqued when my friends kept telling me what they could do with all these data. So right now, my aspiration is to use data analytics to build a model or system that makes many go ‘wow’, or in other words, make magic happen!”
So why are you interested in data analytics:
“I feel like the world is getting flooded with data. From social media to company data, everything is now available to almost everyone. However, even with all these data, most of us don’t know what to do with it. It’s out there, but we can’t do anything to it, while in fact, utilising it can build something as revolutionary as Google. This is why I am interested to learn more to know how we can all use this data and hopefully one day come out with something that will change people’s worlds”

Gabriel

Aspirations:
“Fundamentally, I want to add value to society, but right now, I’m not sure if I should do so through the academia or the industry directly! I hope to discern the path I ought to take for the next few years though! (Especially since I’m graduating soon)
So why are you interested in data analytics:
“I find it deeply satisfying when data can be converted into knowledge and add value to the people’s lives”

Why the Project?

There were many interesting topics that we could have potentially explored but we have decided to dive deeper into music. This is because we felt that music is a hobby enjoyed and appreciated by many. So we wanted to tinker around with musical data via data analytics & visualization tools and ultimately, showcase music through the lens of a data analyst.

Approach

We decided to engage the fundamental elements of songs for our project: music and lyrics. We hope to then perform a combination of both categories to deliver a summary of the top artists that are being streamed on Spotify.

Music

We chose to obtain data on music from Spotify.

In order to extract data from Spotify, we used a library package called Spotipy which utilizes the Spotify API to extract data from the company. Firstly, we had to register on the Spotify API site to make Authorized Requests and pull specific data.

We begin pulling data by first creating a Spotipy object that allows us to call certain functions that extract the desired data.

Creating a Spotipy object

import spotipyfrom spotipy.oauth2 import SpotifyClientCredentialscid =”” #CID provided by Spotifysecret = “” #Client Secret token provided by Spotifyclient_credentials_manager = SpotifyClientCredentials(client_id = cid, client_secret = secret)sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Selection

To gather data of the top artist streamed, we used the Top Artists of 2018 playlist curated by Spotify. We first pulled the ID of the top artists, which was then used to obtain all their albums and the album ID. With the album ID, we could then retrieve individual songs and their track ID. Finally, using the track IDs, we extracted the song data. Each song data was found to contain the following 8 attributes: danceability, acousticness, instrumentalness, valence, loudness, energy liveness, speechiness and tempo.

Processing

We then proceeded to obtain aggregate values for all their song data. After getting the aggregate of all these data, we calculated the average to form a general representation of the artists when it comes to them making songs. This essentially paints a portfolio of the artist’s music career and therefore provides a better understanding of the artist.

We did this using the top artists that were streamed last year and plotted out histograms for each of the metrics and these are the results:

a) Danceability

Measure of how suitable the song is danceable

*0.0 (Least Danceable) — 1.0 (Most Danceable)*

b) Energy

Measure of intensity and activity

*0.0 (Least Energy) — 1.0 (Highest Energy)*

c) Loudness

Measure of the sound’s intensity

d) Speechiness

Measure of the presence of words in a song

e) Acousticness

Measure of how acoustic a song is

*0.0 (Least Confidence) — 1.0 (Most Confidence)*

f) Instrumentalness

Predicts whether a track contains no vocal

*0.0 (Completely Vocal) — 1.0 (No Vocal Content)*

g) Liveness

Measure of whether a song is performed live

*0.0 (Least likelihood to be performed live) — 1.0 (Strongest likelihood to be performed live)*

h) Valence

Measure of positiveness of the song

*0.0 (Most Negative) — 1.0 (Most Positive)*

i) Tempo

Measure of average beat duration.

Analysis

From the valence graph, we can infer that there is a greater preference for artists who focus more on sentimental and melancholic themes in their songs. Artists who are able to produce highly danceable songs are also shown to dominate the charts. Therefore, if new artists want to enter the music industry these days, they should produce music with downhearted themes, coupled with a substantial dance beat to ride on the hype train.

Lyrics

In analyzing many lyric-packed datasets, we drew inspiration from many outlets. Our mentor, Gabriel, was the first to introduce us to an interesting project called SongSim by Collin Morris. It visualizes song lyrics beautifully in an adjacency matrix as shown below.

Image from https://colinmorris.github.io/SongSim/#/

Moreover, Gabriel also introduced us to Soundex, a phonetic algorithm that indexes words by sound. It identified how similar two words sound and from there, we understood why certain words were used in a song. After researching, we found a python package that runs the algorithm — Fuzzy. We felt that we would most probably use this as a basis for building our own adjacency matrix with the output that is obtained from Fuzzy.

After working on the project, we attended the first sharing session conducted in DAP, where many interesting projects were presented. One project performed a sentiment analysis on Youtube comments, and that caught our attention as we quickly saw its value in analyzing lyrics. This could help unveil emotions that the artists meant to evoke through their lyrics, which would build a stronger portfolio around the artist. Soon, we found a tool called NLTK VADER, which was proven handy in our project.

Given these ideas, we intended to leverage all the tools and further conduct an analysis of the top artists’ lyrics on Spotify.

The result — Track Recommend-er

After analysing the different attributes of the Top Artists songs and lyrics, we tried to use these data to build our own recommendation machine based on a track that a user inputs.

Spotify itself has a recommendation function which is able to suggest songs a user might like based on an input track. However, the results are always changing. We suspect that they are also using collaborative filtering, which means using other users’ data to recommend songs to similar users.

We wanted to create a more accurate and static recommendation system to reassure users that these songs are indeed analytically similar to their input track. By counting the number of times Spotify recommends the particular song through multiple requests and the aforementioned 8 measures, we assigned a similarity index to each song. The smaller the index, the more similar the song is to the input track. We are then able to rank these songs and give users the top-ranked songs.

Lyrics visualisation of Rap God by Eminem

Similar songs recommended by our machine to the classic Rap God by Eminem

Closing

Presenting our final product to the rest of DAs

Although we had some difficulties in understanding how exactly the Spotify function works, through this project, we have acquired valuable mathematical and technical skills such as normalisation and how to utilise available APIs. More importantly, our team has also understood how to start our own project, claim responsibility for it, and to finally invent a product using the data available on the project.

In the future, we wish to do more projects using data analytics on other areas that interest us. Hopefully, we’ll be able to learn more through BIA and use these to analyse data more accurately and make more sense out of the data. Till we meet again on our subsequent projects!