The Pursuit of Happiness: Quantifying Emotions for App UX Design
By Brian Adams, CEO and Founder of MightyTV
There’s no accounting for taste. When it comes to movies and television, our preferences are highly personal and entirely subjective. What’s more, our propensity to accept recommendations from outside sources is highly dependent on a number of emotional factors. At MightyTV we aim to build empathy into our user experience and product design. Measuring our users’ happiness as it relates both to the recommendations they receive and how they receive them is a critical goal for our team. Accounting for a person’s emotional experience can be hard to explain and quantify — but we believe it is essential to understand when it comes to the entertainment people choose to consume and share. With this idea in mind, we set out to determine the factors that drive user satisfaction in the MightyTV app and how we might be able to quantify the emotional responses of our users.
In designing MightyTV, we focused on a few core features that we thought would provide us with the most optimal data in measuring the ever-elusive “happiness.” Incorporating swipe-style ratings and a service/price selector was a good start. MightyTV’s swipe-oriented interface makes it super easy to like, love, dislike, or save videos to a watchlist. With every swipe, the app becomes more intelligent over time as algorithms match users to quality recommendations. The goal of our machine learning algorithm is to learn each user’s individual movie and television preferences. It is evaluated using the Graded Average Precision (GAP) metric, which measures how well the video catalog is sorted for each user. While this has been effective for tuning the preference prediction algorithm, it was clear early on that we needed to go deeper to understand the success of the user experience.
When using the initial prototypes of our app, we felt happiest when we had a sense of being understood by our machine learning algorithm, which was dependent on both the positive choices made and the ordering. For example, if a user went through 100 titles and swiped with a mixture of likes and dislikes, we noticed that their confidence and happiness with the app varied greatly depending exactly on how those likes and dislikes were distributed, even when the like and dislike counts remained consistent. While this feeling was still subjective, we attempted to quantify this emotion and properly track it.
We chose to focus on what made our users happy instead of solely relying on traditional metrics, such as how choices affected the number of sessions or time spent using the app. More specifically, we wanted to see if we could quantify the user’s happiness as an additional metric to put alongside traditional ones. This is the result of some out of the box thinking as traditional A/B testing results on standard metrics would be less useful due to our smaller sample size.
USER TESTING SESSIONS
In the beginning, we held user testing sessions to measure how well our machine learning algorithm understood our users. Our initial theory was that if the user felt understood in our model, it could be correlated to their perceived sense of happiness. Our testing yielded some surprising twists to our assumptions. During one particular experiment, a tester told us that she felt the machine learning “kick in” at around swipes 70–80, creating a different experience afterwards. The app recommended several movies set in Boston after she swiped right on a few movies featuring the locale which caused her to react this way. Later, she revealed that she grew up in Boston. At the conclusion of the test, we asked her to draw out two graphs. One was to show how well she thought the app was learning about her and the other to indicate how happy she felt while using the app with each swipe. There were some differences between the two graphs she drew, which indicated that there may be more to a user’s happiness than just how well our machine learning algorithm understands the user. Given that our main goal is to optimize a user’s happiness, we needed to first understand the direct relationship between how we present recommendations to the user and that user’s happiness, rather than focus on how well-understood the user feels by our ML. So, we set up systematic user testing, tracking — via various metrics — how happy our test users felt throughout the duration of the test.
On the quantitative side, we recorded all of our testers’ choices: Like, Dislike, Love, Skip, or Add to Watchlist. We were also able to map various changes in happiness to particular choice sequences. We initially observed that the action of adding a title to the watchlist was an extremely positive reaction and was able to nullify a large amount of past dislikes in the experience. Furthermore, we inferred rules regarding both how particular choices and their order affect user happiness and ultimately modify user behavior within the app.
Once we were able to understand the relationship between how we presented our personalized recommendations to each user and that user’s happiness, we set out to optimize our system’s presentation strategy.
Using standard nonlinear optimization techniques, we were able to locate the parameterization of our presentation strategy that produces the best happiness score. As an illustrative example, consider the following surface plot. If our presentation strategy could entirely be parameterized by only two parameters (Param A and Param B in the diagram), we would be able to visualize our happiness score over all parameterizations (i.e. all possible presentation strategies) as a 3D surface. Our goal is to find the parameterization that produces the best happiness score. This corresponds to finding the highest peak in the surface.
The qualitative and quantitative results of the happiness tests gave us more nuanced insight into the ingredients of a positive discovery experience. In our analysis, we were able to evaluate a user’s perceived sense of happiness at critical junctions of the user experience. The learnings from this analysis have led to material improvements in our UX design and have directly been used to modify the order in which we present our personalized recommendations to each user. As we continue to expand our platform to support discovery in other areas such as art and content, we see tremendous value in building out products which recognize the value of a user’s emotional response.