Understanding Variables in Statistical Analysis

Fatih Şen, PhD
teaching statistics
7 min readFeb 23, 2023

with real-world examples

Eye color: an example of a variable. Image source: Unsplash.

In this article, we’re going to explain variables in statistical analysis, particularly categorical and numeric variables, discrete and continuous variables, and the different ways of measuring variables. Furthermore, we will explore how MagicStat, a browser-based statistical analysis platform, handles different types of variables.

A variable is anything that could take more than one possible value. For example, the type of operating system a smartphone uses (Android or iOS) is a variable. Similarly, the type of coffee someone is drinking at a coffee shop (regular or decaf) is another example of a variable.

Different types of coffee represent a variable. Image source: Unsplash

There are also variables that have lots of potential values, like a team’s score in a basketball game– it could go from zero (a rec team playing against an NBA team) to well over a hundred (maybe two hundred for that NBA team playing in the rec league).

Non-variables (or constants) only take one value– for example, the number of teams in a basketball game will always be two. There will always be seven days in every week. As you might imagine, statistics is much more interested in things that change versus things that stay the same.

Categorical vs. Numeric Variable

Variables can be either categorical or numeric. Categorical variables differ based on categories, while numeric variables are distinguished based on a numerical scale. For instance, the type of operating system or the type of coffee are both categorical variables, whereas the score in a basketball game is a numeric variable.

When we think about categorical variables, they are described using words, like a control group versus an experimental group, or first, second, or third place in a race. Numeric variables, on the other hand, are described using numbers. It’s important to note that just because a variable uses numbers doesn’t mean it is numeric. For example, the numbers in the Japanese puzzle game Sudoku could just as easily be any group of nine unique symbols.

A sudoku game in which the numbers are categorical variables. Image source: Unsplash.

Discrete vs. Continuous Variable

In statistical analysis, we have discrete and continuous variables.

A discrete variable has no value in between two other values. For example, score is a discrete variable because a basketball team can’t score a half a point in basketball. The phones or coffee examples above are all discrete variables. There is no phone that’s “in between” and Android or an iPhone.

In contrast, there are continuous variables, which allow for an infinite number of values between any two values. For example, when measuring the length of something, there are an infinite number of values between 70 inches and 71 inches, including 70 and a half, 70 and a quarter, 70 and an eighth, and so on. Other examples of continuous variables include temperature, time, and weight.

Temperature is a continuous variable. Image source: Unsplash
Both time and weight are continuous variables. Image source: Unsplash.

Scales of Measurement

There are four scales of measurement; nominal, ordinal, interval, and ratio. Understanding these scales of measurement is important because they will help you determine the type of inferential statistical test to perform.

Nominal vs. Ordinal scale

“Nominal” is a latin word that means “name”, and fittingly, groups in a nominal scale are only differentiated by their name. The key feature of a nominal scale is that there is no ordering amongst the groups. For example, eye color, political party, or flavor of potato chips are all nominal scales. None of the categories are “more” or “better” than the others– they’re just different.

In contrast, the key feature of an ordinal scale is that there is an order to the values. Values in an ordinal scale can be ordered or ranked from first to last, most to least, etc. For example, year in school (freshman, sophomore, junior, senior) is an ordinal scale because seniors have more years in school than juniors, who have more than sophomores, who have more than freshmen.

Note that ordinal scales can sometimes take numerical values. For example, if participants receive either 10, 15, or 20 milligrams of a drug, even though it seems that these are numeric variables, it’s actually an ordinal scale. These three conditions are separate categories that can be rank ordered in terms of how much of the drug they got. Likert scales are ordinal scales, too, where “strongly agree” represents more agreement than “somewhat agree” or “disagree”.

Interval vs. Ratio scale

In a ratio scale, zero represents nothingness, or a lack of the variable in question. For instance, zero weight means there is no weight. In an interval scale, however, zero is just another number. For example, in the Fahrenheit and Celsius temperature scales, zero doesn’t mean a lack of temperature, so they are interval scales. In the Kelvin temperature scale, however, absolute zero does represent a lack of temperature, a lack of kinetic activity, so it is a ratio scale.

Let’s practice more

Here are other examples to further illustrate these scales.

Academic major: a nominal scale (one major isn’t better than another)

IQ score: an interval scale

Size of a t-shirt: an ordinal scale

Words typed per minute: a ratio scale

Favorite color: a nominal scale.

Variables in MagicStat

MagicStat is a browser-based platform to perform statistical analysis quickly and effectively. First, let’s first take a look at the data file we want to use on the platform.

The track_training.xlsx data file

The dataset is about people training to run a race. We have an auto increment “ID” column that is a unique identifier for each person. The “training_location” column shows whether the runner trained inside or outside. The “temp” column represents the temperature when they trained. The “time” column shows how long it took them to run the race, and the “place” column shows their ranking from first to last in the race.

Let’s see how it looks inside MagicStat. First, we upload the data file in Excel.

MagicStat data upload user interface that supports Excel, Comma-separated and SPSS files.

On the right side, we see the first five and last five records to give an idea how data looks overall as shown in screenshot below.

The right side panel in MagicStat

The ID variable, which is a unique identifier of a runner, represents a nominal scale. The training_location variable with either inside or outside values represents a nominal scale as well. As you can see below, MagicStat automatically identifies it as a categorical variable. The temp variable is an interval scale because zero isn’t meaningful, it’s just another number. The time variable is a ratio scale and the place variable is an ordinal scale. Again, MagicStat classifies them either categorical or numeric.

What if MagicStat’s automatic classification didn’t classify a variable correctly? Fortunately, the MagicStat platform is flexible enough to give users the option to switch back and forth between categorical and numeric variables. For example, the “ID” variable should be classified as a categorical variable instead of a numeric one. So, we just click the “edit” icon and a window pops up.

A window pops up when the edit icon clicked either on “Categorical variables” or “Numeric variables” section.

There, we can choose the “Categorical” option for the ID variable and MagicStat places it in the “Categorical variables” section once you click the “Apply” button.

The “ID” variable is modified correctly after clicking the “Apply” button.

In conclusion, understanding variables in statistical analysis is crucial in order to effectively analyze and interpret data. This article has provided an overview of categorical and numeric variables, discrete and continuous variables, and the different ways of measuring variables, as well as an overview of MagicStat’s handling of different types of variables.

Acknowledgement: This article was reviewed by Dr. Brent Morgan, a subject-matter expert in statistics.

Note: Here is a relevant video made by Dr. Brent Morgan of this topic.

--

--

Fatih Şen, PhD
teaching statistics

Fatih is a computer scientist, entrepreneur and writer. His current focus is artificial intelligence, data science, startup and personal life experiences.