Learning overview

Common Statistics in UX Research

Joline
5 min readNov 7, 2023

*Notes on Measuring UX

Descriptive statistics

In UX, some common descriptive statistics are useful for interpreting data. Central tendency is a single representative number for a set of values. Common measures are mean, median, and mode. Variability quantifies how much data are dispersed across a range of values. Common measures are range, variance, and standard deviation.

  • Range helps identify outliers.
  • Variance describes the spread of data relative to the mean.
  • Standard deviation, having the same unit as the original data, is easier to interpret than variance (The square root of the variance returns the standard deviation). We often use standard deviation to identify outliers and anomalies in datasets and compare different datasets to learn if one dataset is more spread out. If the standard deviation is small, the data points stay close to the mean and are more precise.
Excel functions:
Standard deviation: STDEV.S()
Variance: VAR()
Square root: SQRT()

Confidence intervals estimate the range of values that likely contain the true population value for a statistic. For example, when we calculate the mean of a dataset with a 95% confidence level, it signifies a 95% confidence that the mean contains the true population mean. Thus confidence intervals around the mean will show the range of values that I’m 95% certain will include the true population mean. The confidence intervals, determined by factors such as sample size, standard deviation, and the chosen alpha level (alpha level=100%-confidence level | usually 5% or 10%), are often displayed as error bars in charts. Each error bar shows if the observed result actually reflects the characteristics of the data. It explains how reliable the observed result is. If the error bar is small, the data points are less spread around, thus the observed results are more reliable. If the error bar is large, the data points are more variable from the mean, thus the observed results are less reliable. Moreover, error bars show how likely there is a significant difference between datasets. This is indicated by the extent of overlap between error bars. If the error bars do not overlap, it’s a clue that the difference may be significant. We can run a statistical test to further evaluate our assumption.

Excel functions:
Confidence intervals: CONFIDENCE(alpha,sigma,n)

Resources:

Confidence Interval Calculator

Add, change, or remove error bars in a chart in Excel

Comparing means

When comparing means, it’s important to consider whether we are dealing with the same sample or independent samples. We use paired samples t-tests if the study involves the same group of users to compare two conditions. We use the independent samples t-test when comparing the means of two different user groups. For more than two user groups, Analysis of variance (ANOVA) is used.

If we test 2 design versions on the same sets of users, we are comparing means for the same subjects. For example, we could be comparing an initial rating and a final rating from the same users to learn about the effect of a design update. Or we might collect users’ task success rates on an old web page and a new version after the UI is improved. The paired samples t-test will determine if there is a significant difference in metric results before and after the design update. If the confidence interval for the two means does not include 0, it indicates a significant difference.

In cases where we are comparing means across different sets of users, we use the independent samples t-test. For example, we might be comparing a button’s placements in two different design alternatives, with one group of users testing design A, and the other group of users testing design B. If we look into the first-click success rates to learn which placement of the button makes more sense, the test will help us determine if there’s a significant difference in first-click success rates between the two groups of users. First, we assess how the confidence intervals overlap. No overlapping indicates the two means are significantly different. Slightly overlapping means the two means might still be significantly different. we need to run a t-test to draw a conclusion. Wide overlapping suggests no significant difference. Running the T.TEST() will return a p-value. For example, if a t-test returns 3.5%(p-value is 0.035), it means there’s only a 3.5% chance that the observed difference in means is due to random variation. If 3.5% is within our alpha level (the acceptable level of errors), we can infer that the two datasets have different means, and thus the observed results have a statistically significant effect.

If we have more than two groups or conditions to compare, we use analysis of variance (ANOVA) to learn if there is a significant difference in the means of these groups. For example, if we are testing 3 weight loss diets on 3 groups, with each group testing one of the diets, ANOVA can determine if there’s a difference among the groups, but it won’t specifically identify which groups are different from each other. To do so, we can employ the independent samples t-test mentioned earlier, and conduct several times to compare pairs of groups in order to identify the specific differences. Take the previous example, after conducting the ANOVA, we find that there is a significant difference among the three diet groups. We can perform the independent samples t-test for diet group A versus diet group B to learn if there is a significant difference between these two specific diets.

Excel functions:
T-test: T.TEST()

Relationships between variables

In exploring relationships between variables, correlations (r) offer insights into the strength of these connections. If r is closer to -1 or +1, the relationship is stronger; if r is closer to 0, the relationship is weaker.

To analyze distribution patterns, we use chi-square tests. For example, is the frequency distribution random or there is an underlying significance to the distribution pattern? The chi-square test tells us whether the differences between observed and expected values are simply due to chance or not. For example, we want to know if there is an association between the users’ experience level and the usability of a website. We are testing a single variable: users’ experience level in terms of novice, intermediate, and expert. We can perform a Chi-Square Goodness of Fit Test.

We asked novice, intermediate, and expert users to give ratings (scale of 1–5) on two design alternatives.
Excel functions:
Correlations: CORREL()
Chi-square goodness of fit test: CHITEST(actual_range, expected_range)

--

--