Hey UX! Meet your new friend Statistics.

Part 2 in a series to make Quora a better mobile experience.

Sahil Khoja
4 min readDec 8, 2015

I recently went out in the field to test Quora’s mobile experience and received a ton of feedback, from changing simple verbiage, to making an entirely new Explore button with a UI similar to Medium.

But how should I prioritize all of this feedback? The problems have to be statistically significant for me to even sketch out a new experience or solution.

The issue with incorporating Statistics into User Research isn’t one of small sample size sizes, its formative testing versus summative testing. In most cases, User Research falls under formative testing, which is the qualitative feedback that we receive from small sample sizes. Here we look for UI Problems that prevent users from completing a task, or delay their response. The type of testing where we look for user’s frustrations, pain points, and create design recommendations based on a mere proportion of users who went through similar issues. Most research, including mine, involves purely formative testing.

So how do we quantify such a qualitative form of testing? Here’s where summative testing comes into play. We can create benchmark goals based on the daily user of a product. For example, it should take under 2 steps to ask a question on Quora. Based on the number of steps it takes users to complete the task, we can calculate how far users deviate from the norm. Comparative testing falls under Summative testing as well, where we compare the ease of asking a question on Quora versus posting a story to Medium, and receive feedback in that manner.

So how should I quantify feedback from 24 people to create a significant design recommendation for Quora?

Introducing the UI Problem Matrix

A UI Problem Matrix breaks down all of the UI Problems users went through and displays the proportion of users that went through that specific problem. In this experiment’s case, we can create a key for each problem:

  • P1: User could not follow a topic in under three steps
  • p2: User questioned why there was a ‘Write Answer’ function on their own question
  • P3: User clicked on ‘Write’ when asked to Ask a Question on Quora
  • P4: User clicked on ‘Edit’ when wanting to add a new topic
  • P5: User clicked on Profile or Personal Information because they believed that Topic-Follow Management belonged in this area

After identifying the key problems, we can create our UI Problem Matrix:

*Prop represents proportions and p is the average proportion

With this data, we can assign impact scores based on the frequency of each problem:

Using this information, we can easily delineate which problems should receive greater attention and which problems might’ve been user-specific.

Let’s add some confidence

Since we have a proportion for each UI Problem, we can use them to create some confidence intervals. These intervals would show us a range of values with the true proportion contained within the interval (hopefully). Why do confidence intervals matter? Well, as most UX Researchers know, we never have access to an entire population when testing.

But the entire population of users would be affected if any design changes are made on the platform.

How can we be confident in claiming that a UI Problem is actually a UI Problem? Through sample proportions and confidence intervals.

The Adjusted Wald Method gives us an accurate interval, since it considers small sample sizes and compensates for the sample size by adding two successes and failures to the proportion. Here’s what the calculation would look like if 7 out of 10 users could complete a task:

  1. Calculate the adjusted proportion (remember to add the 2 successes and failures!)
1.96 is the z-score that comes from a 95% confidence interval

2. Then plug in the adjusted proportion (and adjusted sample size) into the Wald Equation:

We now know that the true proportion lies between 39% and 90% for a user to complete that task. Here’s what the intervals would look like for our experiment:

P1: (0.73, 0.99)

P2: (0.03, 0.32)

p3: (0.68, 0.96)

P4: (0.68, 0.96)

P5: (0.54, 0.88)

Though these intervals may not be too helpful for this specific iteration, its definitely nice to have this skill whenever you’re dealing with large sample sizes in order to figure out if an iteration is worth pursuing or shipping to millions of users on the platform.

Up Next

Here’s where we are in the process:

Completed field testing and quantifying research!

Up next is wireframing and prototyping!

Part 1: Field Testing

Part 3: Wireframing and Prototyping

I don’t work for Quora or represent Quora in anyway. I’m just a curious design student. If you would like me to test or iterate on a product you love, let me know! And if you enjoyed reading this article and learned something new, please hit the ‘Recommend’ button below :)

--

--

Sahil Khoja

Building studentswho.design. PM at Facebook. Cornell Alum. Cancer Survivor. Previously design at Instagram, Facebook, Intuit.