Understanding Statistical Tests: A Guide to Parametric and Non-Parametric Methods
In this article, we’re going to explore how to compare two or more groups using different statistical tests. There are two main types of tests: Parametric and Non-parametric.
Parametric tests are based on certain assumptions about the population from which our sample comes. Specifically, they assume that the data follow a normal distribution, which is a common pattern where most values cluster around a central point.
Non-parametric tests don’t make such assumptions. They don’t require the data to follow any particular pattern or distribution, hence they are often referred to as “distribution-free” tests.
We’ll break down this topic into three sections:
- How to choose the right test for your data.
- A closer look at Parametric tests and when they’re appropriate.
- An examination of Non-parametric tests and their uses.
Choosing a Test
When it comes to picking a statistical test, the key question to ask is, “What hypothesis am I testing?” Sometimes, you might not have a hypothesis; you’re simply looking to understand what your data is showing. For instance, in a study measuring how common a certain characteristic is within a population (a prevalence study), there’s typically no hypothesis being tested. The study’s size is just based on how precisely you want to measure this prevalence. If there’s no hypothesis, then you won’t need a statistical test.
It’s crucial to figure out beforehand which hypotheses are being tested because they presuppose a relationship (confirmatory hypotheses) and which ones are suggested by the data you collect (exploratory hypotheses). You can’t support numerous hypotheses with just one study. It’s wise to sharply limit the number of confirmatory hypotheses you’re testing.
While you can use statistical tests on hypotheses that the data suggests, any P values you get should be seen as indicators rather than definite proof. Think of these results as preliminary until they’re verified by further research. A handy tip here is the Bonferroni correction. This suggests that if you’re testing multiple independent hypotheses, you should divide your significance level by the number of hypotheses. So, for two hypotheses, you’d only consider a result significant if P<0.025. Remember, this is a cautious approach because it’s tough for a result to challenge the status quo (null hypothesis).
Lastly, ask yourself, “Is my data independent?” It’s not always straightforward to determine, but a good general rule is that results from the same individual or from individuals who are closely matched (like in age, sex, or socio-economic status) are not independent. For example, data from a crossover trial or a case-control study with matched controls would not be considered independent.
For a study’s analysis to be valid, it must directly correspond to the study’s design. Matched studies require matched analyses. And be cautious with data collected over time; it’s a mistake to treat such data as independent when in fact, they’re related.
What types of data are to be measured?
The choice of test for matched or paired data is described in Table 1 and for independent data in Table 2.
Table 1: Choice of statistical test from paired or matched observation
When choosing a statistical test, it’s crucial to identify the type of data you have. Input variables are what you manipulate or observe (like the type of treatment in a clinical trial), and outcome variables are the results you measure (like a clinical outcome).
For instance, if you’re comparing treatments (a nominal input variable) and observing normally distributed clinical outcomes, you’d typically use a t-test. But if your input is a continuous variable (like a clinical score) and your outcome is nominal (like whether a patient is cured), logistic regression would be the way to go. A t-test might give some insights, but it won’t tell you the probability of cure at specific clinical score levels.
Let’s say you’re looking at whether there’s a gender difference in how people rate their general practitioner on a five-point scale. Gender is your nominal input variable, and the rating scale is your ordinal outcome variable. Here, since everyone’s response is independent, you could use a Chi-Square test for trend or a Mann-Whitney U test with a correction for ties (where respondents have given the same rating).
It’s important to remember that if people share a general practitioner, the data may not be independent, and you’d need a more complex analysis. Always use such guidelines as a starting point, and consider the specifics of your situation when choosing a test.
Table 2: Choice of statistical test for independent observations
When you encounter data that’s been censored (like in survival analysis where you don’t know the exact time of an event for some subjects), you’d use tests designed for this situation, such as the Mann-Whitney or Log-rank test.
For comparing three or more groups based on ordinal or non-normally distributed variables, you’d use the Kruskal-Wallis test. This is essentially an extension of the Mann-Whitney U test, which is for two groups.
When you’re dealing with normally distributed variables across more than two groups, Analysis of Variance (ANOVA), specifically one-way ANOVA, is appropriate. It’s the parametric counterpart to the Kruskal-Wallis test.
If you’re examining a dependent outcome variable, the key is the distribution of the residuals from the regression (the differences between what you observe and what your model predicts). If these residuals seem normally distributed, then the distribution of the independent variable doesn’t matter as much.
There are more complex methods like Poisson regression for specific types of data, which might need you to simplify your outcome variable into categories or treat it as a continuous measure.
In short, parametric tests assume a normal distribution in the population, whereas non-parametric tests don’t rely on that assumption and can be applied to data that doesn’t follow a normal distribution. There are non-parametric counterparts for many parametric tests, which would typically be listed in a table for easy reference.
Table 3: Parametric and Non-parametric tests for comparing two or more groups
Non-parametric tests work well for data that isn’t normally distributed, and they’re also valid for normally distributed data. You might wonder why we don’t just use them all the time to avoid the extra step of checking for normality. However, parametric tests are often preferred for a few reasons:
- Significance tests alone don’t tell us everything. We usually want to make broader inferences about the population our sample comes from, and parametric tests provide parameter estimates and confidence intervals that help us do that.
- With non-parametric tests, it’s harder to build models that account for multiple variables or adjust for confounders, which is something you can do with multiple regression in parametric testing.
- Parametric tests are generally more powerful, meaning they’re better at detecting true differences when they actually exist.
The discussion on the strengths of parametric tests will be expanded in Part 2 of this series.