Measurement Problems Episode I : Hypothesis Testing(A/B Testing)
Hello everyone from a burning summer day, today we will be with you on measurement problems and in the first article of the series, we will focus on the AB Test, which is one of the hypothesis tests. AB test, which is among the hypothesis tests, is a method we use to explain whether there is a statistically significant difference between the two groups.As a usage, the AB testing can reveal important effects in testing new developments in the world of science and in making decisions that can guide company expenditures in the business world. Thus, decisions based on solid foundations can be taken and studies are developed based on scientific facts. AB test, which reveals beneficial effects if used, is a method that can be recommended to be used instead of primitive methods such as trial and error method of investments and actions in growing companies.
Now let’s examine step by step how this method works.
1. Establishment of hypotheses
At the stage of establishing hypotheses, which is the first stage of the AB test, hypotheses regarding the effect applied on the data are established. As an example, an interface change was made for a mobile application. Hypotheses for this situation:
H0: After the development, there was no change in the number of users using the application.
H1 (Alternative Hypothesis): After the development, there has been a change in the number of users using the application.
will be in the form. Expressing these hypotheses mathematically is:
H0 : M1 = M2
H1: It is in the form of M1 ≠ M2. The number of users here may also appear as a multiplier number in some environments. M1 in these mathematical expressions expresses the number of users before the effect, the average, while M2 expresses the number of users after the effect, the average.
Another point regarding the establishment of hypotheses is that there are one-way or two-way approaches in the formation of hypotheses. For example, for a website, “Users spending time on the website, the time to be on the home page is 18 seconds.” A one-sided approach can be used. We can also state the same proposition in a two-way manner. In this case, our hypothesis is, “Our users, who spend time on our website, are on our website for 72 seconds to 88 seconds at a 95% confidence interval.” will be in the form.
Let’s clarify a concept that we have highlighted in bold and italics in the last paragraph. Confidence interval is a value previously determined by the user in a study. Confidence interval, which is also seen as alpha in the literature, is a measure of the sensitivity of the study. The sector knowledge and experience of us who carried out the study is an important factor in determining this level of importance, and the level of confidence is determined based on these two important factors. In addition, the confidence interval also tells us the level of risk we will face if we reject or reject a situation or decision.
The importance level is one of the important milestones in studies. Due to this situation, the situations should be conveyed clearly and transparently to the decision makers while the studies are carried out. Otherwise, the result of the study will be far from meeting the expectations and the harm will be more than the potential benefit.
2. Controlling to assumptions
After the hypotheses are established, we perform the hypothetical controls to determine the test to be used in the study. There are two assumptions we will check.
- Normality assumption
*Note: Before examining the assumption of normality, it may be useful to make adjustments that do not affect the general structure of the data, such as monthly value analysis, missing value analysis, and data manipulation, if necessary.
2. Assumption of homogeneity of variance.
The normality assumption assumes that the relevant variable is normally distributed for the effect we are testing on the dataset. And its proposition is “There is no significant difference between the data distribution and the theoretical normal distribution”. To check the assumption of normality, the shapiro-wilks test is performed in python. If the P-value we have obtained as a result of this test is less than 0.05, we will reject the hypothesis and we will see that the data is not normally distributed.
The second assumption is the homogeneity of variance assumption. In the analysis of variance, a result emerged after the effect. This result is the change on the variable of interest for the samples taken before and after the effect. With the variance homogeneity control, we check whether the distribution of the variable of interest after the effect and the distribution of the data before the effect are similar. Levene’s test is used to test the variance homogeneity assumption.
3. Application of hypotheses
At this stage, after performing the assumption checks, we determine the tests that we will apply according to the assumption results we have obtained.
There are some situations that we consider when determining the test to be applied:
1. When the assumptions of normality and homogeneity of variance are checked, if both assumptions are met, then two independent sample T-tests are performed.
**Note: After the control, the assumption of normality was provided; If variance homogeneity is not provided, T-Test is used again, but the “equal_vars = False” argument is entered during the test.
2. After the assumption checks, if the assumption of normality is not provided, the “Manwhitneyu” test, which is one of the non-parametric tests, is performed.
4. Result
In the final stage of the study, after performing the assumption checks, we apply the tests that are suitable for our data (T-test, manwhitneyu test)
Then, based on the test output, we comment on the H0 and H1 hypotheses that we determined at the beginning of our study. At this stage, if the result of the test we determined as a result of the assumption control is lower than the value of 0.05, which is the alpha value (P-val), we reject the H0 hypothesis determined at the beginning of the study. Let’s remember the H0 hypothesis that we determined at the beginning of the study:
H0: After the development, there was no change in the number of users using the application.
H1 (Alternative Hypothesis): After the development, there has been a change in the number of users using the application.
Here we end the first article of our series. I hope it was an enjoyable reading session for you. See you in our next article :)
You can visit my github profile in the link for a more detailed applied study of the article.