Quick Intro to Statistics — Power Your Stories with Data
Not a statistician? No problem! Learn the basics of statistical analysis in a few minutes
Storytelling with Statistics
When working in the tech world (or at any job, for that matter), knowing how to harness statistics empowers you to make data-driven decisions. Whether you’re a marketer, designer, or developer, it is absolutely critical that you understand statistical terminology, how to interpret findings, and when to transform those findings into action.
The most important take away should be that statistics alone will not necessarily make your arguments better. Statistics are fuel for your stories, but they are not stories in themselves. Make sure that you frame your findings in a way that persuasively move your audience, enriching your data with meaning and a call to action.
“Once something has occurred and we can put together a story to explain it, it starts to seem like the outcome was predestined. Statistics don’t appeal to our need to understand cause and effect, which is why they are so frequently ignored or misinterpreted. Stories, on the other hand, are a rich means to communicate precisely because they emphasize cause and effect.”
― Michael J. Mauboussin, The Success Equation
Population and Sample
A population is any large collection of objects or individuals, such as Americans, students, or trees about which information is desired.
A parameter is any summary number, like an average or percentage, that describes the entire population.
A sample is a representative group drawn from the population.
A statistic is any summary number, like an average or percentage, that describes the sample.
Measures of Central Tendency
The mean of a set of numbers, sometimes simply called the average, is the sum of the data divided by the total number of data.
The median of a set of numbers is the middle number in the set (after the numbers have been arranged from least to greatest) — or, if there are an even number of data, the median is the average of the middle two numbers.
The mode of a set of numbers is the number which occurs most often.
The difference between the lowest and highest values in a set.
- When to use Mean, Median, or Mode — Laerd Statistics
- Measures of Central Tendency Formulas — Australian Bureau of Statistics
The general idea of hypothesis testing involves:
- Making an initial assumption.
- Collecting evidence (data).
- Based on the available evidence (data), deciding whether to reject or not reject the initial assumption.
A null hypothesis proposes that no statistical significance exists in a set of given observations. It is the hypothesis that the researcher is trying to disprove.
An alternative hypothesis simply is the inverse, or opposite, of the null hypothesis. So, if we continue with the above example, the alternative hypothesis would be that there IS indeed a statistically-significant relationship between multiple variables.
A Type 1 Error is the incorrect rejection of a true null hypothesis (also known as a “false positive” finding).
A Type 2 Error is incorrectly retaining a false null hypothesis (also known as a “false negative” finding)
The p-value is a number between 0 and 1, that can be interpreted as follows:
- A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
- A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
- p-values very close to the cutoff (0.05) are considered to be marginal (could go either way). Always report the p-value so your readers can draw their own conclusions.
A technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables. See also correlation.
Independent Variable — It is a variable that stands alone and isn’t changed by the other variables you are trying to measure. For example, someone’s age might be an independent variable.
Dependent Variable — A dependent variable is the variable being tested and measured in a scientific experiment. The dependent variable is ‘dependent’ on the independent variable. As the experimenter changes the independent variable, the effect on the dependent variable is observed and recorded.
Regression Analysis — Regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable and a series of other changing variables (the independent variables).
Simple Linear Regression — Regression that uses only one independent variable and describes the relationship between the independent and dependent variables as a straight line.
Correlation Coefficient (r) — the correlation coefficient r measures the strength and direction of a linear relationship between two variables. It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related. If r is close to 0, it means there is no relationship between the variables.
R-Squared — R-squared is a statistical measure of how close the data are to the fitted regression line. It is the percentage of the variation that can be explained by a linear model.
- R-squared = Explained variation / Total variation-
- R-squared is always between 0 and 100%:
- 0% indicates that the model explains none of the variability of the response data
- 100% indicates that the model explains all the variability of the response data around its mean.
- Calculating a Simple Linear Regression — Statistics How To
- Introduction to Linear Regression — Online Stat Book
Calculate a Percent Increase
First: work out the difference (increase) between the two numbers you are comparing.
Increase = New Number — Original Number
Then: divide the increase by the original number and multiply the answer by 100.
% increase = Increase ÷ Original Number × 100.
If your answer is a negative number then this is a percentage decrease.
Calculate a Percent Decrease
First: work out the difference (decrease) between the two numbers you are comparing.
Decrease = Original Number — New Number
Then: divide the decrease by the original number and multiply the answer by 100.
% Decrease = Decrease ÷ Original Number × 100
If your answer is a negative number then this is a percentage increase.
Here are some great online statistics guides to help you.