How to one hot encode Sequence Data in Java

Learner1067
Analytics Vidhya
Published in
3 min readApr 4, 2020

Machine Learning Algorithms are not intelligent enough to understand data represented in text format. Lets try to understand it in context of Chat bot .

User: What is your Name 
ChatBot: My name is ChatBot.
User: Tell me about whethere in Pune
ChatBot : Pleasant as always

As in above example the input is in text format and with varying length [ input one contain 7(including spaces) words and input contains 11 words. Generally Recurrent Neural Network Algorithms are used for Natural Language processing in chat bot. Algorithms only understand numerical data , reason being mathematical operation can be only performed on numerical data. As we can not perform gradient descent on data in String format.

In order to convert the data in string format to equivalent numerical data , we can use one hot encoding as one of the option . In today’s article I will try to describe one hot encoding implementation in java .

Lets work through example to understand it in detail.

In alphabet there are 26 characters and if we add one more character to represent empty space then it becomes 27. Below table depicts the mapping, last row represents space mapped to 26 . We have fancy term to define below mapping i.e Integer Encoding

Now lets try to relate it with actual input text supplied to chatbot

Going back to initial example User: What is your Nameinput can be represented as Array of Strings as below.String[] input = [ "what"," ", "is"," ", "your", " ","name"];Integer Encoding for above input can be represengted as below.Integer[] values = [22, 7, 0, 19, 26, 8, 18, 26, 24, 14, 20, 17, 26, 13, 0, 12, 4]

That was initial step in processing text data , now lets go a step further and perform one hot encoding . In one hot encoding data is represented either using 1 or 0.

Since we are using 27 character [ alphabet characters + space] , we can use below logic to encode each entry in integer encoded Array.Lets look into details of it.

Interger encoding of 'b' = 1
Interger encoding of 'a' = 0
Please note 0 will be first index and 26 will be last index [ Alphabets + space]Now lets use integer array of size 27 to represent one hot encoding.one hot encoding for 'b' = [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] one hot encoding for 'a' = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Final here goes implementation in Java with out using any external Library .

package com.ai.algorithm.core;
import java.util.Arrays;
import java.util.HashMap;
public class OneHotEncoder implements IEncoder {
private HashMap<Character, Integer> intergerEncoder = new HashMap<Character, Integer>();
private Character[] chars = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q',
'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ' };
private void populateIntegerEncoder() {
for (int i = 0; i < chars.length; i++) {
intergerEncoder.put(chars[i], i);
}
}
// Populate the integer Encoder
public OneHotEncoder() {
populateIntegerEncoder();
}
@Override
public int[][] encode(String input) {
if (input == null | input.isEmpty()) {
return null;
}
input = input.toLowerCase();
int[] intergerEncoding = new int[input.length()];
for (int i = 0; i < input.length(); i++) {
Character c = input.charAt(i);
intergerEncoding[i] = intergerEncoder.get(c);
}
System.out.println("Integer Encoded Values " + Arrays.toString(intergerEncoding));
int[][] oneEncodedValues ​= new int[intergerEncoding.length][chars.length];
for (int i = 0; i < intergerEncoding.length; i++) {
int[] d = new int[chars.length];
d[intergerEncoding[i]] = 1;
oneEncodedValues[i] = d;
}
printMatrix(oneEncodedValues);
return oneEncodedValues;
}
private void printMatrix(int[][] matrix) {
System.out.println("One hot encoded Values");
for (int i = 0; i < matrix.length; i++) {
System.out.println(Arrays.toString(matrix[i]));
}
}
public static void main(String[] args) {
String input = "What is your Name";
System.out.println("Input String: " + input);
OneHotEncoder encoder = new OneHotEncoder();
encoder.encode(input);
}
}

--

--