What is One Hot Encoding and How to Do It

  • Imagine if you had 3 categories of foods: apples, chicken, and broccoli. Using label encoding, you would assign each of these a number to categorize them: apples = 1, chicken = 2, and broccoli = 3. But now, if your model internally needs to calculate the average across categories, it might do do 1+3 = 4/2 = 2. This means that according to your model, the average of apples and chicken together is broccoli.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import numpy as np
import pandas as pd
dataset = pd.read_csv('made_up_thing.csv')
X = dataset.iloc[:, :-1].values
le = LabelEncoder()
X[:, 0] = le.fit_transform(X[:, 0])
ohe = OneHotEncoder(categorical_features = [0])
X = ohe.fit_transform(X).toarray()
le = LabelEncoder()#for 10 columns
for i in range(10):
X[:,i] = le.fit_transform(X[:,i])

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store