Pandora’s Next Best Feature

9 min readNov 13, 2017

Anushikha Sharma, Benjamin Matase, Sierra Magnotta, Stefano Cobelli

Introduction

The theme of this HCI Sprint was ‘Design for Wellbeing’ and our team used the assigned readings, brainstorming, and user-observation to build the Next Best Pandora Feature.

Figure 1: Screenshot of the webpage of our prototype

At its core, our product hopes to emulate the music station Pandora but with an added twist. The platform uses the Affectiva library to personalize the music that a user wants to listen to, based on their demonstrated emotions. Our logic uses the Affectiva technology and the webcam to identify whether a user is enjoying the music. If the user seems to be liking the song or singing along to it, then our platform asks them to give the song a thumbs up to ensure that they can keep listening to similar artists and genres. However, if the user appears to be unhappy with the song, the platform asks the user to give the song a thumbs down so that the genre/artist can be changed. If the user clicks the thumbs down button, the platform switches to a different playlist. If the user shows no signs of like or dislike, the platform continues to cycle through the playlists without making any suggestions.

For our demo, we had three playlists with different artists, timelines and genres —

Playlist 1: 2017 Pop: Justin Bieber, One Direction, DJ Snake
Playlist 2: 1980’s Pop: Michael Jackson, Madonna
Playlist 3: 2000 Classic Rock: Wolfmother

For the purposes of this prototype, we used an image of the actual Pandora background in our platform to help us effectively simulate the music station. This demo video below will further help demonstrate how to interact with our ‘Design for Wellbeing’ version of Pandora.

Video 1: Demo

Note: For the demo video we used free songs rather than the songs from our playlist.

Initial Brainstorming

In one of our readings, ‘We Need Computers with Empathy’ by Rana el Kaliouby, we saw an example of how Amazon’s Alexa misinterpreted Rana’s speech and began playing songs that not only interrupted the user’s speech practice but also left her startled, annoyed and frustrated. This frustration was a result of Rana feeling like she had lost control over a piece of technology that was designed to help her. This example proves the need for the Usability Principle of Control, Transparency and Predictability from ‘Steps to take before Intelligent User Interfaces become real’ by Kristina Höök.

Keeping this principle in mind, we started out by thinking about ways in which we could use facial emotions to improve daily user experiences without taking too much control from the user. Our original brainstorming allowed us to put down ridiculous ideas on the board.

The ideas included -

Helping students avoid procrastination (“You can’t do your homework because it keeps closing the webpage you’re using if you’re stressed”).
Cheering up a stressed-out user by repeatedly playing their favorite songs (“Play ‘The Boys are Back in Town’ till the person smiles”).
If you’re stressed, it recommends resources to help you calm down (“Takes the user to a meditation site in the middle of a Differential Equations Homework”).
Suggests taking a break if the overall happiness of the user has been consistently low (“Go take a walk, bud”).
It asks to close Netflix if you look like you’re falling asleep (“You look like you’re falling asleep, would you like to close out of Netflix for tonight?”).

From this list emerged the concept of improving the typical Spotify/Pandora music experience by switching across playlists if a user’s facial expressions demonstrated that they didn’t like the song. On platforms like Pandora, stations have similar songs but you can’t customize the songs or artists. Hence, we thought it would be useful if a user’s facial expressions could maneuver auto-generated playlists to include songs that they might prefer.

Figure 2: Outlining the logic of switching between playlists

Figure 4: Outlining the thumbs up/thumbs down logic

User-Testing and Refining our Designs

We performed some initial testing to see how people react to music they like and don’t like. Early on, we discovered that people don’t act naturally if they know they are being watched and recorded, so we changed to a more informal style of testing. From this, we saw that people don’t tend to show many facial expressions when listening to music they don’t like. When listening to songs they enjoy, people tended to move their head around and sing along to the music.

Video 2: User-Testing

To create our product, we had to also consider the limitations of Affectiva, which we used for emotion detection. In line with our initial testing, we were able to use the mouthOpen parameter to judge if a person was singing along. We had more trouble finding out ways to determine if the user wasn’t enjoying a song, but found that Affectiva had a valence parameter that gave the general positive/negativeness of a person’s expression, giving a positive number for expressions like joy and negative number for expressions like sadness or disgust. Our group thought that this would be good to use in conjunction with mouthOpen to determine if the person was enjoying the song or not. A problem we were warned of early on was that Affectiva is constantly giving output and this can be difficult to draw conclusions from. When playing around with Affectiva, we saw that the output changed both quickly and drastically, showing joy, then sadness, then shock within a 1 second span where the user’s face hadn’t changed much at all. To deal with this potential issue, we found averages of our chosen parameters over time. This way, rather than telling the user (probably incorrectly) that they disliked a song as soon as Affectiva showed negative valence, we found the average of all valence and mouthOpen outputs while being used and checked the average, rather than any single datapoint, to determine the user’s emotion.

Overall, this was a reasonably effective way to check the user’s enjoyment of a song. The valence output was very helpful in detecting when a user did or didn’t enjoy a song. The mouthOpen output was effective in seeing when a person was singing along, which fit in with our initial testing. However, most users also tend to open their mouths for other reasons, such as speaking to a friend or neighbor. This can happen regardless of what they think of the current music, so mouthOpen can give a potentially erroneous assumption. We chose to continue using mouthOpen as an indicator of enjoyment since this was in line with our initial testing, but ideally we would go back to this issue later and perform more user testing to see how we can accurately capture singing while avoiding other forms of talking.

Development

One of our initial ideas was to explore the Pandora API in hopes of being able to integrate our new feature ideas into the service itself. This would have been useful if we wanted the service to thumbs up or thumbs down for the user, opposed to suggesting. From our research into design principles in emotional computing, we knew that transparency is imperative for usability, and thus, decided to use adaptive prompts to suggest song voting. For simplicity, we decided against using the Pandora API because it seemed easier to control and predict our own emulation of Pandora.

To emulate Pandora, we have a static screen and a media player on a webpage we hosted. The Affectiva library runs in the background and starts recording after permission is given. For the sake of transparency and clarity, we use a prompt box in the center to address the user.

Figure 5: Screenshot of the first screen the user interacts with

The prompt starts by addressing that you have full understanding that Pandora is reading your emotions with our imaginary “Emotional Recognition Beta”. We would not want users to be recorded without their consent, so we are playing out a scenario where the users are voluntary signing up for the experiment.

The Affectiva library is easily implemented by importing the script and building a detector. The “onImageResultsSuccess“ event works as a frame that the library captures from your webcam. On this frame, we analyze the valance and mouthOpen parameters. After user testing, we found appropriate thresholds for consecutive frames of valence and mouthOpen to estimate the emotions of the user. If the detector believes the user is happy, we prompt the user with a message asking if they want to thumbs up the song.

Figure 5: The prompt displayed when we detect the user opening their mouth

This is effective because it is completely non-invasive and transparent. No decision is being made for the user as they are simply receiving a reminder to thumbs up. When the user doesn’t seem to be enjoying a song, the prompt asks if they would like to thumbs down the song. Voting on a song will affect the songs you hear in the future on Pandora, but is a conscious decision made by the user. We hope that this subtlety and transparency would open users to working with emotional computing.

Figure 6: Sierra testing our final product to see if it registers negative emotions

Figure 6: Ben singing along to MJ’s top hits while testing our prototype

Results: Strengths and Weaknesses of Our Design

Our design had several strengths that were reflected in the feedback that we received from our peers. During the class demo, users seemed to have liked how we had placed our additional feature on top of an actual Pandora screen because it demonstrated how Pandora or Spotify could actually implement something similar. They also enjoyed the friendly nature of the text on screen. Users highly appreciated the “non-invasive” features of the project. There was a general consensus that the design had kept the user-experience in focus by using emotion detection to only prompt the suggestions and not disruptively change the song. It also counters the weaknesses of the the Affectiva library by only allowing for suggestions, in case the input that is received isn’t related to the music at all.

Figure 7: Anushikha dancing along, wishing there was some way to register head nods and hand motions

Though the project had many positives, we also received some constructive feedback. Users suggested that to ensure more transparency between the technology and the listener, it could be useful to have an alert that mentioned how the Affectiva technology was going to utilize their demonstrated emotions to make playlist suggestions. Users also wished that there was some technology to register head nods (because it’s a primary way in which people signal that they’re enjoying the music) and to also be able to tell the difference between when someone was simply talking versus when they were singing along. Some testers also had trouble with the uninteractive nature of the Pandora-like background. Their familiarity with products like Pandora resulted in an expectation that our platform would work similarly and hence, the users tended to click on a several aspects of the screen that were only props.

Keeping this feedback in mind, if we had more time, we would have liked to experiment with connecting specific emotions to certain genres and would have also implemented more interactive Pandora-like features on our platform to better help simulate the experience that we were hoping to provide our testers.

Conclusion

In conclusion, we learned several important lessons through this design sprint and are happy with our final product. As we struggled to juggle our creativity with the usability principle of ‘Control, Transparency and Predictability’, we understood that though as technologists we want to automate everything around us, it is important to remember that our users (for whom these products are designed) should be able to understand the inner workings of the technology and control it to a certain degree.