Statistics 101: Understanding the different type of variables.
As we enter the latter part of the year 2020, it is safe to say that companies utilize data to assist in making business decisions. For example doing exploratory data analysis (EDA) to calculate statistics of where the business stands today, it may include a simple Linear Regression model to predict product prices in 2021. Perhaps it utilizes neither and instead uses clustering to determine relationships between data points. Regardless of how data is utilized, possessing a strong statistics background can only aid in the decision making process as to which approach is taken to best extract, hypothesize, and interpret data.
With that being said let us start with the very basics of statistics: variables. Variables can be broken down into two different categories. Quantitative (Numerical) and Qualitative (Categorical). Quantitative variables can be further broken down into two subcategories: Continuous and Discrete.
Continuous quantitative variable can be defined as a numerical value that may fall within a large range to which one may say “well it could be anything.” Yes I know that may not make sense but lets utilize a few examples: numerical values such as age, weight, height, BMI are examples of continuous quantitative variables. These are examples of numbers that are always changing and may be within an extremely large range. You may be asking “Well age does not seem like it could fall within a range, if someone asked me how old I am I could answer with an exact number.” Well is that true? Remember age is a form of time, in which it is always changing, therefore age is considered a continuous quantitative variable as well.
Discrete is an exact numerical value. When I think of discrete, I think of distinct. I think of an exact number. For example, if I was asked how much I spent today in dollars at the food truck. My response would be a distinct number.
Now let us discuss the categorical/qualitative variable. These variables represent a group of ordered/ranked or non-ordered/ranked set of values. For example utilizing high school class would be an example of categorical/qualitative data. Freshmen, Sophomore, Junior and Senior may be represented as 1 through 4 respectively.
Similar to quantitative numerical variables, qualitative categorical variables also have two subtypes: Ordinal and Nominal. Remember earlier I stated that this type of data may be represented in an order or sequence. That describes Ordinal categorical variables. A great example is on a scale of 1–5 with 5 being the worst pain rank how you feel. Nominal is the opposite of ordinal in which it lacks order or ranking. For example: If an individual is over 18 years old mark the 0 and if the individual is less than 18 mark the number 1. An order or ranking is not present for it to be considered an ordinal quantitative variable.
To recap: I spoke about two categories of variables and their subclasses. This concept is extremely important when utilizing data science to assist in making hypothesis, and conclusions on data to improve business processes.
Thank You for Reading!
— — — — — — — — — — — — — — — — — —