From Data science concepts you need to know! Part 1 by Michael Barber

…ting a histogram of P-values derived from the two tests after running a simulation many many times. We will remember that the p-value is *the probability that the difference between groups is observed purely by chance (i.e. there is no real difference between groups)* . The two populations here have the same mean, this means that we should accept the null hypothesis a…

…lue the more normal the distribution. Often we will use a P-value cutoff of 0.1 to ensure normality. So anything with a p-value of <0.1 we would class as non-normal. This approach is complementary to the above, and I would recommend both plotting a QQ-plot (as above) **and** running a Shapiro test (and reporting both in any report).

Plot 2 **is both simple and true to the data**. It is therefore the best plot here. Note that the only way to figure out the best plot is to examine the underlying data. I would therefore recommend using either a bee swarm or a density plot with a small bin width (or h…