K-Fold Target Encoding

5 min readFeb 4, 2019

Overview:

1. Concept

2. Code

1. Concept

One-hot-encoding, label-encoding, frequency-encoding, target-encoding, etc, are very well-known techniques which have been usually used in feature engineering to improve the accuracy of a prediction when there are categorical features in a dataset [1, 2].

Each feature engineering technique can address and determine a specific aspect of a feature. One-hot encoding or dummy encoding is a robust technique when the number of categorical variables is limited in a feature. However, it might not be useful when the number of categorical variables increases in the feature because it results in increasing the dimension of the dataset. Furthermore, the label-encoding has its limitation since it gives random order to feature. Moreover, there is no correlation between label-encoding and the target [3].

Target encoding is one of the most powerful techniques in feature engineering which has been widely applied and developed in different forms [4]. In this post, we are going to discuss and implement k-fold target encoding for a sample dataset. Basically, the goal of k-fold target encoding can be reducing the overfitting in mean-target encoding by adding a regularization to the mean encoding.

A sample of a train and a test dataset are shown in Fig.1. For simplicity, we consider there are two categorical variables in “Feature” columns, A and B. “Target” column is a binary variable 0 or 1. The test dataset has a “Feature” column; but, it does not have “Target” column. Note that if the target is a continues variable this approach will not change.

Fig1.) Train and test data set. The train has “Feature” column and target column. Feature column included two variables A and B.

The basic idea of the k-fold target encoding originates from the mean-target encoding. In the mean-target encoding, the categorical variables are replaced by the mean of the target corresponding to them. It is seen from Fig.2 that the mean of the target when “Feature” is A = 0.6 and B=0.3. Therefore, A and B will be replaced by 0.6 and 0.3 respectively. This new feature might be more correlated to the target. However, this approach might have a tendency to the overfitting when the distribution of the categorical variables in Feature of the train and the test dataset are considerably different.

Therefore, k-fold target encoding can be applied to reduce the overfitting. In this method, we divide the dataset into the k-folds, here we consider 5 folds. Fig.3 shows the first round of the 5 fold cross-validation. We calculate mean-target for fold 2, 3, 4 and 5 and we use the calculated values, mean_A = 0.556 and mean_B = 0.285 to estimate mean encoding for the fold-1.

Fig.3) 5-fold target encoding. We use fold 2,3,4,5 to estimate first-fold.

After that, we can calculate for the second fold as it is shown in Fig.4

Fig.4) 5-fold target encoding. We use fold 1,3,4,5 to estimate second-fold.

Now the remaining part is creating “Feature_Kfold_Target_Enc” column in the test dataset. This column values can be obtained from getting mean of “Feature_Kfold_mean_Enc” train column for the categorical variables “A” and “B”.

Fig5.) We estimate the Feature_Kfold_target_Enc for the test from the train.

Although k-fold target encoding is a robust feature engineering method, there is no guarantee that it always the best method to improve accuracy. We need to apply diverse feature engineerings to see which one give us better performance.

2. Code

Let’s write code for k-fold target encoding. The KFoldTargetEncoderTrain class gets the name of the feature column, target column and number of fold initially and then fit and transform the train data. It returns a data frame included “Feature_Kfold_mean_Enc”. Note that, if a fold is not included, for example, “B” categorical variable, thus, it results of NAN which we fill NAN with the global mean of the target.

class KFoldTargetEncoderTrain(base.BaseEstimator,
                               base.TransformerMixin):    def __init__(self,colnames,targetName,
                  n_fold=5, verbosity=True,
                  discardOriginal_col=False):        self.colnames = colnames
        self.targetName = targetName
        self.n_fold = n_fold
        self.verbosity = verbosity
        self.discardOriginal_col = discardOriginal_col    def fit(self, X, y=None):
        return self    def transform(self,X):        assert(type(self.targetName) == str)
        assert(type(self.colnames) == str)
        assert(self.colnames in X.columns)
        assert(self.targetName in X.columns)        mean_of_target = X[self.targetName].mean()
        kf = KFold(n_splits = self.n_fold,
                   shuffle = False, random_state=2019)        col_mean_name = self.colnames + '_' + 'Kfold_Target_Enc'
        X[col_mean_name] = np.nan        for tr_ind, val_ind in kf.split(X):
            X_tr, X_val = X.iloc[tr_ind], X.iloc[val_ind]
            X.loc[X.index[val_ind], col_mean_name] =    
            X_val[self.colnames].map(X_tr.groupby(self.colnames)
                                     [self.targetName].mean())            X[col_mean_name].fillna(mean_of_target, inplace = True)        if self.verbosity:            encoded_feature = X[col_mean_name].values
            print('Correlation between the new feature, {} and, {} 
                   is {}.'.format(col_mean_name,self.targetName,                    
                   np.corrcoef(X[self.targetName].values,
                               encoded_feature)[0][1]))
        if self.discardOriginal_col:
            X = X.drop(self.targetName, axis=1)
        return X

To run the code,

targetc = KFoldTargetEncoderTrain('Feature','Target',n_fold=5)
new_train = targetc.fit_transform(train)

Finally, we need to create the “Feature_Kfold_Target_Enc” in the test dataset by using the following class. The class needs train dataset and “Feature” and the name of the encoded column. After that fit and transform the test dataset.

class KFoldTargetEncoderTest(base.BaseEstimator, base.TransformerMixin):
    
    def __init__(self,train,colNames,encodedName):
        
        self.train = train
        self.colNames = colNames
        self.encodedName = encodedName
        
    def fit(self, X, y=None):
        return self    def transform(self,X):        mean =  self.train[[self.colNames,
                self.encodedName]].groupby(
                                self.colNames).mean().reset_index() 
        
        dd = {}
        for index, row in mean.iterrows():
            dd[row[self.colNames]] = row[self.encodedName]        X[self.encodedName] = X[self.colNames]
        X = X.replace({self.encodedName: dd})   return X

we need

test_targetc = KFoldTargetEncoderTest(new_train,
                                      'Feature',
                                      'Feature_Kfold_Target_Enc')
new_test = test_targetc.fit_transform(test)

You also can download the code from:

https://github.com/pourya-ir/Medium/tree/master/K-fold-target-enc

Without a doubt, there are many ways to improve this post. Your feedback and suggestions are welcome!

References:

[1] https://chrisalbon.com/#machine_learning

[2] https://www.saedsayad.com/encoding.htm

[3] https://www.slideshare.net/0xdata/feature-engineering-83511751

[4] https://www.coursera.org/lecture/competitive-data-science/concept-of-mean-encoding-b5Gxv

Supplement

you can use this function to make your own train and test dataset.

def getRandomDataFrame(data, numCol):
    
    if data== 'train':
    
        key = ["A" if x ==0  else 'B' for x in np.random.randint(2, size=(numCol,))]
        value = np.random.randint(2, size=(numCol,))
        df = pd.DataFrame({'Feature':key, 'Target':value})return df
    
    elif data=='test':
        
        key = ["A" if x ==0  else 'B' for x in np.random.randint(2, size=(numCol,))]
        df = pd.DataFrame({'Feature':key})return df
    else:
        print(';)')

K-Fold Target Encoding

Overview:

1. Concept

2. Code

1. Concept

2. Code

References:

Supplement

Written by Pourya