Baby steps of statistics

Biraj Parikh
GreyAtom
Published in
5 min readJul 16, 2017

What is a variable?

A variable is a series of data points that varies.

For example,

  • If I asked a bunch of people what’s their eye color ? The variable would be eye color which would vary, because some people would tell me blue, some people would tell me brown and so on.
  • If I ask people how tall they are? They might tell me 60 inches or 5 foot 9 etc. Here, height would be the variable.

We need to understand the kind of variable in order to know what type of statistics to use. To ensure this, we split the variables int0 an independent variable and a dependent variable. And then we can decide, with a particular variable, we can conduct a specific statistical test.

So what is the difference between an independent variable and a dependent variable?

  • Well, one of them is a cause and the other is an effect. The independent variable is the cause and the dependent variable is the effect.
Dependent and independent variables
  • Another example, supposing my two variables are how many calories you eat a day and your weight. Now your weight doesn’t cause how many calories you eat every day, it’s the other way around.
  • How many calories you eat causes your weight.
  • So, in this case calories is the independent variable because it’s causing something and the dependent variable is the outcome which is your weight. It’s the effect of how you eat.

Types of Data:

Numeric data have meaning as a measurement, such as a person’s height, weight, IQ, or blood pressure; or they’re a count, such as the number of stock shares a person owns, how many teeth a dog has, or how many pages you can read of your favorite book before you fall asleep. (Statisticians also call numerical data as quantitative data.)

It is further broken down into:

  1. Discrete: Discrete data represent items that can be counted; they take on possible values that can be listed out. The list of probable values may be fixed (also know as finite); or else it may start from 0, 1, 2, etc. to infinity (countably infinite).
  2. Continuous: Continuous data represent measurements; their possible values cannot be counted and can only be described using intervals on the real number line.

For example, your shoe size is discrete and your foot size is continuous.

Discrete vs Continuous

Categorical data represents characteristics such as person’s gender, marital status, hometown, or the types of movies they like.

Levels of Measurement is a key characteristic of any particular variable. There are actually four levels of measurement,

  • Nominal
  • Ordinal
  • Interval
  • Ratio

Nominal variables are organised into non-numeric categories that cannot be ranked or compared quantitatively. This type of data is often referred to as qualitative.

○ Appropriate mathematical operation: counting the number of cases per category.

  • Nominal means that the variable just tells us something about the classification of the variable.
  • There’s no ordering in that particular characteristic, for example, eye color. You’re brown or you’re blue or you’re green or whatever color your eyes are, there’s no particular ordering amongst them that one is more or less than another.
  • For example, Jersey number for athletes.

Ordinal variables are organised into rank-able categories.

○ Appropriate mathematical operations: counting and ranking.

  • For example : How was your service at the restaurant?
  • Good, fair, poor, very good, excellent. That’s an ordinal scale.
  • Another example : Rank order of winners.
Rank order of winners

Interval variables have an exact interval between categories, allowing for a direct comparison between categories, such that the difference between any two sequential data points is exactly the same as the difference between any other two sequential data points.

○ Appropriate mathematical operations: counting, ordering, and addition, subtraction, multiplication and division of the interval between values (but not the values themselves).

  • Example: time of the day: 10:00 am, 10:20 am, noon, 4:00 pm, 8:00 pm, etc. In this example, we can say that 10:20 is exactly 20 minutes later than 10:00, but we can’t say that 8:00 is “twice as late” as 4:00, and it doesn’t make sense to add noon + 4:00.
  • Another example: The Fahrenheit and Celsius scales of temperatures.You can talk about 30 degrees being 60 degrees less than 90 degrees, so differences do make sense. However, 0 degrees (in both scales) cold as it may be, does not represent the total absence of temperature.
Interval variable example

Ratio variables have all of the characteristics of nominal, ordinal and interval variables, but also have a meaningful zero point.

○ Appropriate mathematical operations: counting, ordering, and addition, subtraction, multiplication and division of the interval between values as well as the values themselves.

  • Due to “a zero”, it is logical to compare the ratios of measurements. Phrases like “four times” and “twice” hold significance at the ratio level.
Ratio variable example
  • Distances, in any system of measurement give us data at the ratio level. A measurement such as “0 feet” does make sense, as it represents no length. Furthermore, 2 feet is twice as long as 1 foot. So, ratios can be formed of the data.

I hope you now have a basic idea about variables. If you have any queries or any additions, responses and comments are always invited.

Stay tuned for Part II.

--

--

Biraj Parikh
GreyAtom

Machine Learning enthusiast passionate about finding meaningful insights.