Turn Words Into Numbers — The NLP Way

Learn to looking at words as a sequence of numbers

Rishi Sidhu
AI Graduate

--

Photo by Juan Gomez on Unsplash

In this article, we will look at how to tokenize never-before-seen words. Python’s tensorflow tokeniser can easily convert known words into tokens but what happens when you throw it words that it hasn’t seen before?

Tensorflow tokenizer is a very powerful tool. As shown in the article below it is very easy to get started with it.

The tokenizer can be used to convert a set of training data (sentences) into a dictionary where each unique word gets a different ID, so to say. Let’s look at how to create a dictionary out of words.

In Tensorflow, this dictionary is called a word index

Training set

  1. Apples are red

--

--

Rishi Sidhu
AI Graduate

Blockchain | Machine Learning | Product Management