How to Become the Master of your Variables

Odeniyi Olawale
4 min readJul 16, 2022

--

Variables are the sugar in the tea of data. They are “indispensable”. Variables are very essential ingredients that must be put into consideration by every data analyst. They are dynamic and can be very flexible as they are designed to take on different values depending on the circumstance. According to collins dictionary, a variable is a quantity that can have any one of a set of values.

As a data analyst or someone looking to work with data, understanding the type of variables you are working with is usually one of the first things you should do before carrying out any analyses. And this is just one of the steps in the “understand your data” part of your analysis. You need to read up on every variable in your data to know how each has been generated, what they represent, and identify if any of them is not particularly useful for your analysis. Once you’ve done this, you can then start to make decisions on what variables are useful and which to be dropped…

Misinterpreting variables can cost you a lot. Taking the time to study your variables, asking questions (that need to be asked about them), and even looking up the meaning of some of these variables to understand different contexts in which they could be used would go a long way in saving your time, effort, and resources.

Some months ago, I worked on a health-related project (courtesy Udacity), where I analyzed several factors that could be responsible for patients not showing up on their appointment days… The dataset to be analyzed contained variables like “No show”, “Hypertension” “Alcoholic”, “Age”, and so on. The first thing I did was to understand what “No show” means and how it is used in the health industry. I then made sure I understood the response coding for the variables “No show”, “Hypertension”, “Alcoholic”, “Handicapped”, and so on.

I needed to understand what 0 represented and what 1 represented for each response. When you read my report here, you will see that I clearly defined each variable and its coding scheme. Though this took a bit of time it helped me to understand that this dataset is mainly composed of categorical variables which later influenced my analytical approach.

Alright! Now that you know the several reasons why variables always matter in every analysis, let’s take a quick dive into the different classes of variables that we have.

Classification of Variables

Variables can be broadly classified into:

  • qualitative and,
  • quantitative.

PS: variables can also be classified as dependent or non-dependent.

Qualitative (categorical) Variables

Qualitative (or categorical) variables describe features that in their original forms do not behave as numbers. These variables may represent objects, places, locations, names, identity, ranking/position, etc. They may be encoded with numbers, but typically function as labels. They are non-numeric and therefore numeric operations should not be performed on them. These variables are limited to specific numeric values.

There are two types of qualitative variables:

• Nominal variables

• Ordinal variables

Nominal vs Ordinal variables

Nominal variables: These types of categorical variables store data that is strictly used to label things. They are pure labels without any inherent order (no label is intrinsically greater or less than any other). For instance, coding male participants as 1 and Female participants as 2 doesn’t necessarily imply that female participants are greater than the male participants. Other examples are; labeling species names, colors, age groups, etc.

Ordinal variables: These variables are used to show rankings and or positions. Note that they are also labels but with an intrinsic order or ranking (comparison operations can be made between values, but the magnitude of differences is not well-defined). The Likert scale used in questionnaires is a typical example of ordinal variable. Representing Strongly Agree as “4” gives it a higher ranking or preference compared to Strongly Disagree which is represented as “1”. Another common example is finishing place in a racing competition.

Quantitative (numeric) Variables

Quantitative (or numeric) variables, on the other hand, are the numeric types. They take numeric values that allow arithmetic operations and allow comparison. Common examples that fall into this category are; heights, weights, counts of students, amount of sales, temperature, age, etc. For this type of variable, there is usually no defined limit.

Quantitative variables are also of two types:

  • Interval variables (discrete)
  • Ratio variables (continuous)

Interval vs Ratio variables

Interval (discrete) variables: These variables take on numeric values where their absolute differences are meaningful. Addition and subtraction operations can be made on them. Discrete variables typically record counts of individual items or values. Examples:

  • Number of pages in a book
  • Number of songs in a playlist
  • Number of students in a class

Ratio (continuous) variables: These variables take on numeric values where their relative differences are meaningful. Data stored as a continuous variable can take on any numeric value. i.e., fractions, decimals, negative values etc. Multiplication and division operations can be made on them. These variables can be split into smaller and smaller units and can (hypothetically) take on values to any level of precision. Examples include:

  • Age
  • Heights
  • Distance moved
  • Cost of goods sold

Differentiating between continuous and discrete variables can be sometimes a little tricky, but don’t worry you’ve totally got this!!

If you’ve read till this point, you are amazing.😊 Show some love🧡 by sharing this value with others…

Connect with me on LinkedIn

Keep the data alive!

Data is Matter!!

--

--

Odeniyi Olawale

Hi, I’m Odeniyi Oluwasegun Olawale. I am a data analyst and I enjoy writing. Be free to stick around and have a nice read... :)😉