As we covered in our recent NLP blog, there are a lot of cool use cases for text / sentiment analysis. One recent instance we found really interesting came out of our May presentation at SeaTUG (Seattle Tableau User Group.) As part of our presentation / demo we decided to find out what some of the local Tableau users could do with trial access to Keboola; below we’ll highlight what Hong Zhu and a group of students from the University of Washington were able to accomplish with Keboola + Tableau for a class final project!
What class was this for and why did you want to do this for a final project?
We are a group of students at the University of Washington’s department of Human Centered Design and Engineering. For our class project for HCDE 511 — Information Visualization, we made an interactive tool to visualize music data from Last FM. We chose the topic of music because all 4 of us are music lovers.
Initially, the project was driven by our interest in having an international perspective on the popularity vs. obscurity of artists and tracks. However, after interviewing a number of target users, we learned that most of them were not interested in rankings in other countries. In fact, most of them were not interested in the ranking of artists/tracks at all. Instead, our target users were interested in having more individualized information and robust search functions, in order to quickly find the right music that is tailored to one’s taste, mood, and occasion. Therefore, we re-focused our efforts on parsing out the implicit attributes, such as genre and sentiment, from the 50 most-used tags of each track. That was when Keboola and its NLP plug-in came into play and became instrumental in the success of this project.
What specific data set(s) did you analyze and how did you collect it?
We extracted a large amount of data from Last FM’s API. On the high level, there are two main data sets: top 100 artists in each country and top 100 tracks in each country. For each of the two data sets, we extracted ranking, country name, total play count of all times, URL linking to the artist/track, and top 50 tags for each artist/track. The total number of data points was over 2 million. We eventually narrowed them down to 6 dimensions in our final visualization: Track Name, Artist, Play Count, URL, Top Tag, and Overall Sentiment Score. We also decided to only visualize the top tracks data set due to time constraint of the school quarter.
Latest cleaned data set with sentiment scores:
So you’ve got the data, now what?
Moving over to our trial access to the Keboola platform, it was a few simple clicks and an authorization to access the data through Dropbox and bring in the spreadsheet.
Once you have the data you want to analyze, it’s a matter of clicking into the Keboola app store, selecting the app you want to run (NLP highlighted below) and choosing the table you want to analyze.
Because the outputs from Geneea were 50 separate tables — one for each tag, we needed to combine them into one table and calculate the overall sentiment score for each track.
Due to our limited experience in Python and lack of SQL knowledge, we were only able to join 2 tables at a time using Transformation. In the end, we downloaded all the tables and joined them locally. (**Typically this would be done with a SQL, Python or R transformation within the platform.)
Ready to visualize
Once the data is ready for analysis, its simple to download it as a .TDE file from Keboola or send it directly to Dropbox or Google Drive for consumption in Tableau desktop. You can also create a live data connection directly to Tableau Server.
In this case, Tableau public was the right choice for data visualization. I’ve provided a screen shot below or you can check out the live viz here.
We’re glad we could lend a hand to the UW students (and thanks for letting us be part of a cool data project!) As mentioned at the outset of the blog, please check out our previous blog The value of text (data) and Geneea NLP app if you’d like to learn a bit more about the app or feel free to reach out.