TOP 3 ENCODING TECHNIQUES

Pushkar saini
3 min readApr 17, 2023

--

Encoding is a process of converting data from one form to another. In machine learning, encoding techniques are used to convert categorical data into numerical data. This is important because most machine learning algorithms work with numerical data, not categorical data. In this document, we’ll discuss different encoding techniques and when to use them.

Using the encoding technique is important but more important is to understand which one to use when?

One-Hot Encoding: One-hot encoding is used when we have categorical variables with no inherent order or ranking. For example, food items or makeup items. In one-hot encoding, each category is assigned a unique binary value, and a “1” is assigned to the corresponding category while a “0” is assigned to all other categories. One-hot encoding works well when there are a limited number of categories.

One-Hot Encoding

Label Encoding: Label encoding is used when we have a categorical variable with some inherent order. For example, low, medium, and high or small, medium, and large. In label encoding, each category is assigned a numerical value in the order of importance or order of occurrence. This technique is useful when we have a small number of categories.

Label Encoding

Binary Encoding: Binary encoding is used when we have a large number of categories, and one-hot encoding becomes computationally expensive. In binary encoding, each category is assigned a unique binary value, and the values are then combined to create a binary code for each category. This technique works well when we have a large number of categories.

Binary Encoding

Conclusion: encoding techniques are essential in machine learning when we have categorical data. One-hot encoding is useful for a limited number of categories, while label encoding is useful when we have a small number of categories with some inherent order. Binary encoding works well when we have a large number of categories, and count encoding is useful when we need to capture the frequency of each category. Target encoding captures the correlation between the categorical variable and the target variable, and feature hashing is useful when one-hot encoding is not feasible. Understanding these techniques is crucial in selecting the appropriate encoding technique for our machine-learning problem.

--

--

Pushkar saini

Experienced Analyst in data analysis, machine learning, statistics and analytics, data mining, data visualization, process automation, and business development.