A/B Testing in Web Page Version

Weng Seng
4 min readAug 31, 2019

A/B testing is a method of comparing two versions of a web page each other to determine which one performs better.

In this post I will explain how I analyze with A/B Testing.

For this project, whether we should implement the new page or keep the old page with AB testing. For the full details of the result can be found here.

Null and Alternative Hypotheses

We assume that the old page is better unless the new page proves to be definitely better at a Type I error rate of 5%.

Assume under the null hypothesis, p new and p old are converted success rate regardless of page.

I will use two method to calculate the p-value, first method will be without using statsmodels library and second method will be using statsmodels.

Calculate p-value without Statsmodels Library

First we perform the sampling distribution for the difference in converted between the two pages over 10,000 iterations. Store all 10,000 values in p_diffs where n_new is total number of landing new pages, n_old is total number of landing old pages.

p_diffs = []

for _ in range(10000):
new_page_converted = np.random.binomial(1, p_new, n_new)
new_page_converted_mean = new_page_converted.mean()
old_page_converted = np.random.binomial(1, p_old, n_old)
old_page_converted_mean = old_page_converted.mean()
p_diffs.append(new_page_converted_mean - old_page_converted_mean)

The we plot the p_diffs histogram:

The red line is the Actual difference. Then we calculate the proportion of the p_diffs are greater than the actual difference and we got 0.90349

Note:
p-diffs represents the simulated difference between converted rates of new page and old page, based on 10000 simulated samples.
Actual difference represents the difference between converted rates of new page and old page, based on our data.

Explanation:

The value 0.90349 calculated here is called p-value and suggest that we failed to reject the null hypothesis.

Calculate p-value with Statsmodels Library

Using the built-in might be easier to code, the above section are a walkthrough of the ideas that are critical to correctly thinking about statistical significance.

import statsmodels.api as sm
z_score, p_value = sm.stats.proportions_ztest([convert_old, convert_new], [n_old, n_new], alternative='smaller')
z_score, p_value
Output: (1.3109241984234394, 0.90505831275902449)

Note:
convert_old and convert_new are the number of conversions for each page.

Explanation:

p-value 0.90505831275902449 similar to previous result we failed to reject the null hypothesis.

A regression approach

We will use logistic regression to predict because only two possible outcomes.

First we need to create a column for the intercept, and create a dummy variable column for which page each user received. Our goal is to use statsmodels to fit the regression model.

df2[['control', 'treatment']] = pd.get_dummies(df2['group'])

df2['intercept'] = 1 log_mod = sm.Logit(df2['converted'], df2[['intercept','treatment']]) result = log_mod.fit()
result.summary()
Result

Explanation:

The logistic regression determines only two possible outcomes. The new page is equal to the old page or not equal:

The p-value associated with ab_page column is 0.19 which is lower than the p-value calculated previously using the z-score, p_value function. The reason why is different is due to the intercept added.

Add New Variable — Country

We merge together the datasets and countries.csv dataset on the appropriate rows.

df_country = pd.read_csv('countries.csv')df_country[['CA', 'UK', 'US']] = pd.get_dummies(df_country['country'])df2_country = df2.join(df_country.set_index('user_id'), on='user_id')
df2_country.head()
log_mod2 = sm.Logit(df2_country['converted'], df2_country[['intercept', 'UK','US']])result2 = log_mod2.fit()result2.summary()
result 2

Conclusion:

In this case, we found that the old page was better than the new page, because we fail to reject the null hypothesis.

From the regression above we see that the p-value is lower in UK, which means that users in the UK are more likely to convert, but still not enough evidence to reject the null hypothesis with alpha level 0.05 but if the alpha level 0.10 then we can reject the null hypothesis.

--

--