Understanding Variables in Statistical Analysis
with real-world examples
In this article, we’re going to explain variables in statistical analysis, particularly categorical and numeric variables, discrete and continuous variables, and the different ways of measuring variables. Furthermore, we will explore how MagicStat, a browser-based statistical analysis platform, handles different types of variables.
A variable is anything that could take more than one possible value. For example, the type of operating system a smartphone uses (Android or iOS) is a variable. Similarly, the type of coffee someone is drinking at a coffee shop (regular or decaf) is another example of a variable.
There are also variables that have lots of potential values, like a team’s score in a basketball game– it could go from zero (a rec team playing against an NBA team) to well over a hundred (maybe two hundred for that NBA team playing in the rec league).
Non-variables (or constants) only take one value– for example, the number of teams in a basketball game will always be two. There will always be seven days in every week. As you might imagine, statistics is much more interested in things that change versus things that stay the same.
Categorical vs. Numeric Variable
Variables can be either categorical or numeric. Categorical variables differ based on categories, while numeric variables are distinguished based on a numerical scale. For instance, the type of operating system or the type of coffee are both categorical variables, whereas the score in a basketball game is a numeric variable.
When we think about categorical variables, they are described using words, like a control group versus an experimental group, or first, second, or third place in a race. Numeric variables, on the other hand, are described using numbers. It’s important to note that just because a variable uses numbers doesn’t mean it is numeric. For example, the numbers in the Japanese puzzle game Sudoku could just as easily be any group of nine unique symbols.
Discrete vs. Continuous Variable
In statistical analysis, we have discrete and continuous variables.
A discrete variable has no value in between two other values. For example, score is a discrete variable because a basketball team can’t score a half a point in basketball. The phones or coffee examples above are all discrete variables. There is no phone that’s “in between” and Android or an iPhone.
In contrast, there are continuous variables, which allow for an infinite number of values between any two values. For example, when measuring the length of something, there are an infinite number of values between 70 inches and 71 inches, including 70 and a half, 70 and a quarter, 70 and an eighth, and so on. Other examples of continuous variables include temperature, time, and weight.
Scales of Measurement
There are four scales of measurement; nominal, ordinal, interval, and ratio. Understanding these scales of measurement is important because they will help you determine the type of inferential statistical test to perform.
Nominal vs. Ordinal scale
“Nominal” is a latin word that means “name”, and fittingly, groups in a nominal scale are only differentiated by their name. The key feature of a nominal scale is that there is no ordering amongst the groups. For example, eye color, political party, or flavor of potato chips are all nominal scales. None of the categories are “more” or “better” than the others– they’re just different.
In contrast, the key feature of an ordinal scale is that there is an order to the values. Values in an ordinal scale can be ordered or ranked from first to last, most to least, etc. For example, year in school (freshman, sophomore, junior, senior) is an ordinal scale because seniors have more years in school than juniors, who have more than sophomores, who have more than freshmen.
Note that ordinal scales can sometimes take numerical values. For example, if participants receive either 10, 15, or 20 milligrams of a drug, even though it seems that these are numeric variables, it’s actually an ordinal scale. These three conditions are separate categories that can be rank ordered in terms of how much of the drug they got. Likert scales are ordinal scales, too, where “strongly agree” represents more agreement than “somewhat agree” or “disagree”.
Interval vs. Ratio scale
In a ratio scale, zero represents nothingness, or a lack of the variable in question. For instance, zero weight means there is no weight. In an interval scale, however, zero is just another number. For example, in the Fahrenheit and Celsius temperature scales, zero doesn’t mean a lack of temperature, so they are interval scales. In the Kelvin temperature scale, however, absolute zero does represent a lack of temperature, a lack of kinetic activity, so it is a ratio scale.
Let’s practice more
Here are other examples to further illustrate these scales.
Academic major: a nominal scale (one major isn’t better than another)
IQ score: an interval scale
Size of a t-shirt: an ordinal scale
Words typed per minute: a ratio scale
Favorite color: a nominal scale.
Variables in MagicStat
MagicStat is a browser-based platform to perform statistical analysis quickly and effectively. First, let’s first take a look at the data file we want to use on the platform.
The dataset is about people training to run a race. We have an auto increment “ID” column that is a unique identifier for each person. The “training_location” column shows whether the runner trained inside or outside. The “temp” column represents the temperature when they trained. The “time” column shows how long it took them to run the race, and the “place” column shows their ranking from first to last in the race.
Let’s see how it looks inside MagicStat. First, we upload the data file in Excel.
On the right side, we see the first five and last five records to give an idea how data looks overall as shown in screenshot below.
The ID variable, which is a unique identifier of a runner, represents a nominal scale. The training_location variable with either inside or outside values represents a nominal scale as well. As you can see below, MagicStat automatically identifies it as a categorical variable. The temp variable is an interval scale because zero isn’t meaningful, it’s just another number. The time variable is a ratio scale and the place variable is an ordinal scale. Again, MagicStat classifies them either categorical or numeric.
What if MagicStat’s automatic classification didn’t classify a variable correctly? Fortunately, the MagicStat platform is flexible enough to give users the option to switch back and forth between categorical and numeric variables. For example, the “ID” variable should be classified as a categorical variable instead of a numeric one. So, we just click the “edit” icon and a window pops up.
There, we can choose the “Categorical” option for the ID variable and MagicStat places it in the “Categorical variables” section once you click the “Apply” button.
In conclusion, understanding variables in statistical analysis is crucial in order to effectively analyze and interpret data. This article has provided an overview of categorical and numeric variables, discrete and continuous variables, and the different ways of measuring variables, as well as an overview of MagicStat’s handling of different types of variables.
Acknowledgement: This article was reviewed by Dr. Brent Morgan, a subject-matter expert in statistics.
Note: Here is a relevant video made by Dr. Brent Morgan of this topic.