One-Hot Encoding in Machine Learning

3 min readMay 7, 2019

When working with Neural Networks and specifically image data the term one-hot encoded is frequently used when discussing the prepossessing of data. You’ll hear, “Labels for images in Keras are one-hot encoded vectors.” But what does that mean? This blog post will aim to address and define one-hot encoding, why is it used and how is utilized. Additionally, address different types of categorical data and how to process them effectively.

Why do you need one-hot encoding?

Categorical Data (CD) contain label values rather than numeric values:

Each value in the above data frame represents a different image label. However, there is some categorial data we might want to keep together because there is a relationship. Such as rankings:

This is an example of ordinal values and you’ll also come across nominal values.

The problem with categorical data is not all models can work with categorical data. Decision Trees can learn from CD without transforming in some cases. However, most machine learning models need all variables, both input and output to be numeric.

One way to tackle this is with integer encoding. This is done by using something like a mapping function to convert categorical values to ordinal. Meaning they are ranked. Like so:

This doesn’t always work. Why? Assigning a higher value to a variable when none exist can throw off the model. Back to the food data:

Enter One-Hot Encoding

One-hot encoding assigns common variables their own vector and gives them a value of 1 or 0. The length of these vectors is equal to the number of classes or categories the model is expected to classify. As a result the more variables in a column will now correspond with how often the variable appears in the vector.

Where does the term one-hot come from?

In a vector with 4 elements they will all be 0s EXCEPT the element that corresponds to the actual category. This will be a 1. Notice how each 1 has its own spot in the vector.

Let’s tie it all together:

Computers can’t tell the difference between the words banana, hotdog, hamburger or ice cream. The image recognition algorithms on your computer can, however can tell the difference between a 1 and 0. By using one hot encoding we’re encoding our labels with an integer or a vector of integers, specifically a 1 or 0. Thus, giving our models data it can process.

More about me and links to my other blogs: https://orah82.github.io/

One-Hot Encoding in Machine Learning

Why do you need one-hot encoding?

Enter One-Hot Encoding

Written by Omar Raheem