Turn Words Into Numbers — The NLP Way
Learn to looking at words as a sequence of numbers
Published in
3 min readSep 1, 2020
In this article, we will look at how to tokenize never-before-seen words. Python’s tensorflow tokeniser can easily convert known words into tokens but what happens when you throw it words that it hasn’t seen before?
Tensorflow tokenizer is a very powerful tool. As shown in the article below it is very easy to get started with it.
The tokenizer can be used to convert a set of training data (sentences) into a dictionary where each unique word gets a different ID, so to say. Let’s look at how to create a dictionary out of words.
In Tensorflow, this dictionary is called a word index
Training set
- Apples are red