Data Types Unplugged
From Numbers to Tags, Demystifying Data Categorization!
How useful can data be if we don’t even know how to categorize it? There are many ways to encode information, but there are some types that are better suited to communicate results (that’s most of a data professional’s purpose).
The most common types are numeric and categorical. Try to get these names tattooed or imprinted in your mind, since the concept alone tells a lot about what we are measuring.
Numerical Data
Its mostly numbers. The End. Well, not really. Numerical data can use additional symbols to identify units. Numerical data also breaks down in two main categories, continuous data and discrete data.
Discrete Data
Discrete data focuses on whole numbers. Let’s say we’re counting people or cars. We turn into The Count from Sesame Street: One Person, Two Person, Threeeee Person. But never 1.2 person or 2.5 person (unless we’re talking about TV shows). Remember that the type of data being measured, reflects the reality of the thing being measured.
Continuous Data
Continuous data focuses on the “in-betweens”. With discrete data we see things as entire objects, but what if we needed more precision?
Let’s imagine we work in The Olympics and we are the ones responsible for measuring javelin throws. I guess, we could use discrete data to measure the distance of the throws, but that would cause a lot of trouble. Athletes train their whole lives to gain extra centimeters in their throws, just for them to be erased by wrong measurements. If two athletes got 20.5m and 20.9m throws, with discrete data they would be treated as the same (either 20m or 21m, since the field wouldn’t be treated as a continuous line).
Categorical Data
The way I see categorical data, is as if they were tags. It’s made up of words, symbols and even numbers, but they don’t behave in the same way. Categorical data is also broken down in two main types, ordinal data and nominal data.
Ordinal Data (Ordered)
When working with ordinal data, the order matters. Remember that categorical data can be seen as tags, but in this case, the order in which the tags are presented matters.
For example, shirt sizes are represented with categories (small, medium, large) in which the order represents a difference in size. The same applies with surveys in which you rank something from 1 to 10. In this case the categories are numbers, but they’re acting as tags, in which a 10 might represent “Very Good” and a 1 would be “Very Bad”.
Nominal Data (Unordered)
Nominal data doesn’t care about an order. The categories or choices don’t follow a hierarchy. You can’t say that one option is better than the other or that an option is “greater than” another option.
For example, if a survey asks you a “Yes/No” question, the options provided aren’t really ordered in any way. If you’re choosing a color for your shirt, the order in which they’re presented doesn’t imply a hierarchy or importance.