# Quick Intro to Statistics — Power Your Stories with Data

## Not a statistician? No problem! Learn the basics of statistical analysis in a few minutes

# Storytelling with Statistics

When working in the tech world (or at any job, for that matter), knowing how to harness statistics empowers you to make data-driven decisions. Whether you’re a marketer, designer, or developer, it is absolutely critical that you understand statistical terminology, how to interpret findings, and when to transform those findings into action.

The most important take away should be that statistics alone will not necessarily make your arguments better. **Statistics are fuel for your stories, but they are not stories in themselves**. Make sure that you frame your findings in a way that persuasively move your audience, enriching your data with meaning and a call to action.

“Once something has occurred and we can put together a story to explain it, it starts to seem like the outcome was predestined. Statistics don’t appeal to our need to understand cause and effect, which is why they are so frequently ignored or misinterpreted. Stories, on the other hand, are a rich means to communicate precisely because they emphasize cause and effect.”

―Michael J. Mauboussin,The Success Equation

# Population and Sample

A **population** is any large collection of objects or individuals, such as Americans, students, or trees about which information is desired.

A **parameter** is any summary number, like an average or percentage, that describes the entire population.

A **sample** is a representative group drawn from the population.

A **statistic** is any summary number, like an average or percentage, that describes the sample.

# Measures of Central Tendency

## Mean

The mean of a set of numbers, sometimes simply called **the average**, is the sum of the data divided by the total number of data.

## Median

The median of a set of numbers is the middle number in the set (after the numbers have been arranged from least to greatest) — or, if there are an even number of data, the median is the average of the middle two numbers.

## Mode

The mode of a set of numbers is the number which occurs most often.

## Range

The difference between the lowest and highest values in a set.

## Further Learning

- When to use Mean, Median, or Mode — Laerd Statistics
- Measures of Central Tendency Formulas — Australian Bureau of Statistics

# Hypothesis Testing

The general idea of hypothesis testing involves:

- Making an initial assumption.
- Collecting evidence (data).
- Based on the available evidence (data), deciding whether to reject or not reject the initial assumption.

## Null Hypothesis

A null hypothesis proposes that no statistical significance exists in a set of given observations. It is the hypothesis that the researcher is trying to disprove.

## Alternative Hypothesis

An alternative hypothesis simply is the inverse, or opposite, of the null hypothesis. So, if we continue with the above example, the alternative hypothesis would be that there IS indeed a statistically-significant relationship between multiple variables.

## Errors

A **Type 1 Error **is the incorrect rejection of a true null hypothesis (also known as a “false positive” finding).

A **Type 2 Error** is incorrectly retaining a false null hypothesis (also known as a “false negative” finding)

## Further Learning

- Hypothesis Testing Explained — Statistics How To
- Hypothesis Testing Essentials — Laerd Statistics

# Statistical Significance

## P-Value

The *p*-value is a number between 0 and 1, that can be interpreted as follows:

- A small
*p*-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis. - A large
*p*-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis. *p*-values very close to the cutoff (0.05) are considered to be marginal (could go either way). Always report the*p*-value so your readers can draw their own conclusions.

## Further Learning

- How to Determine P-Value — Deborah J. Rumsey
- Calculate P-Value — How To Stats

# Regression

A technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables. See also correlation.

**Independent Variable** — It is a variable that stands alone and isn’t changed by the other variables you are trying to measure. For example, someone’s age might be an independent variable.

**Dependent Variable** — A dependent variable is the variable being tested and measured in a scientific experiment. The dependent variable is ‘dependent’ on the independent variable. As the experimenter changes the independent variable, the effect on the dependent variable is observed and recorded.

**Regression Analysis **— Regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable and a series of other changing variables (the independent variables).

**Simple Linear Regression** — Regression that uses only one independent variable and describes the relationship between the independent and dependent variables as a straight line.

**Correlation Coefficient **(*r*) — the correlation coefficient ** r** measures the strength and direction of a linear relationship between two variables. It ranges from -1.0 to +1.0. The closer

**r**is to +1 or -1, the more closely the two variables are related. If

**r**is close to 0, it

**means**there is no relationship between the variables.

**R-Squared **— R-squared is a statistical measure of how close the data are to the fitted regression line. It is the percentage of the variation that can be explained by a linear model.

- R-squared = Explained variation / Total variation-
- R-squared is always between 0 and 100%:
- 0% indicates that the model explains none of the variability of the response data
- 100% indicates that the model explains all the variability of the response data around its mean.

## Further Learning

- Calculating a Simple Linear Regression — Statistics How To
- Introduction to Linear Regression — Online Stat Book

# Percentage Change

## Calculate a Percent Increase

**First:** *work out the difference (increase) between the two numbers you are comparing.*

Increase = New Number — Original Number

**Then:** *divide the increase by the original number and multiply the answer by 100.*

% increase = Increase ÷ Original Number × 100.

If your answer is a negative number then this is a percentage decrease.

## Calculate a Percent Decrease

**First:** *work out the difference (decrease) between the two numbers you are comparing.*

Decrease = Original Number — New Number

**Then: ***divide the decrease by the original number and multiply the answer by 100.*

% Decrease = Decrease ÷ Original Number × 100

If your answer is a negative number then this is a percentage increase.

# Learn More!

Here are some great online statistics guides to help you.

- Probability and Statistics — Self-Paced — Stanford
- Intro to Statistics Course Online — Udacity
- Data Analysis & Statistics — edX