The goal of this project was to conduct a series of tests to demonstrate the value of quantitative and qualitative user tests for evaluating different designs. As a group, Elvis Zhang, Rachel Wang, Zachary Espiritu, and I collected and analyzed user behavior through quantitative data in A/B Testing and qualitative data in Eye Tracking. We designed the following website for Memphis Taxis in two slightly different ways: (https://cs1300-ab-testing.herokuapp.com)

The beginning of version A and B are the same
But the bottom of version A and B differ by display of content: vertical vs. horizontal


Null Hypotheses

The end goal was to determine if we could reject the same null hypothesis that site A is no better than site B based on each of the following metrics:

  1. Click-through rate — the percentage of users that click on the website.
  2. Time to click — the average time it takes a user to click.
  3. Dwell time — the average time a user takes to return to the site after clicking.
  4. Return rate — the percentage of users that return to the site after clicking off.

Alternative Hypotheses

For each null hypothesis, we created an alternative hypothesis, which is contrary to the null hypothesis. It is not possible to prove the alternative hypothesis, but we can try and make an assumption for why we would be able to reject our null hypothesis for each metric.

  1. Click-through rate — A > B because most of the buttons are at the end of the page.
  2. Time to click — B < A because the first call to action (CTA) is closer to the top of the page, so users may be inclined to click on each CTA as they scroll down the page.
  3. Dwell time — B < A because the CTAs are in a vertical column, users will decide to review each website for themselves and then come back immediately to review the next website.
  4. Return rate — B < A because of the same reason that the dwell time for B will be less than A. The layout of Version A will encourage users to review all of the options before settling on a single decision.

Eye Tracking Hypothesis

Version B will have a greater proportion of eye-gazes toward the left side of the screen than Version A because the important data for each of the comparisons are located on the left side of the screen.

A/B Testing

For each described metric, we individually computed the measurements and conducted the appropriate statistical test to determine whether or not we could reject our null hypotheses. My task was to choose between using a chi-squared test and a t-test. Both tests essentially compare the values from A and B to see how different they are from each other, but there are subtle differences.

Computations: https://cs1300-stats-tests.herokuapp.com

I wrote up a PHP script to explain and compute the analyzations of each piece of data, which you can view here: https://cs1300-stats-tests.herokuapp.com. This site will allow you to click on which metric you want to analyze, and it will walk you through how it performs the correct statistical test.

  • Click-through and return rate — Compute the metric from the data logs. Then calculate the sum of all the (O-E)²/E, where O is the observed value and E is the expected value, to get the chi-squared value. Compare this value with the probability value for 1 degree of freedom at 0.05 (which is 3.84). If the chi-squared value is greater than that, our data is statistically significant so we can reject the null hypothesis. Otherwise, we fail to reject it.
  • Average dwell time and click time — Compute the average time from the data logs. Then use the sample size, sample mean, and standard deviation to compute a T value. Use a T-chart to find a critical T value in the 95% confidence interval (corresponding with our degrees of freedom). Compare our T value with the critical T value. If our T value is greater than the critical T value, our data is statistically significant, so we can reject the null hypothesis. Otherwise, we fail to reject it.

Bayesian Probability

Another way to analyze these metrics is with Bayes’ Beta Distribution Theorem. Essentially, it describes the certainty of our hypotheses through probability. Below is an example of how to use a Bayesian A/B test for click-through rate, and the analysis of my data subsequently follows.

How to compute Bayes’ Probability for click-through rate
My Bayesian computation of click-through rate


As mentioned, during this process we also used various eye-tracking equipment and software to watch two users’ eyes as they explore our websites. After examining the eye-tracking logs, I created a script — viewable at the same website as above — to generate a heatmap and a replay of the users’ eye-gazes:

Version A & Version B Eye-Tracking Heatmaps
Version A — Replay of User’s Eye-Gazes
Version B— Replay of User’s Eye-Gazes

Interpretation of Data

Looking back at our eye-tracking hypothesis, it seems that both users were actually more attracted to the middle of the screen. However, for Version A of our website this meant looking at just one or two car companies, but with Version B, the user actually read all the data displayed for every business. Although the data did not agree with our hypothesis, it generated valuable information about how users viewed our websites: they were attracted to the middle of the screen.


If we were conducting this experiment for a real taxi business in Memphis, we would not have a concrete answer for which version of the website to choose. Since we were unable to reject all of the null hypotheses throughout our A/B tests, I would consider leaning towards the eye-tracking test data to help make a decision. The Memphis Taxi company would have to decide if they want to direct users to one or two car companies like in Version A, or if they want users to read all the data before choosing like in Version B.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store