5 NLP Tasks for Word Parsing

Get a grip on the Natural Language Processing landscape! Start your NLP journey with this Periodic Table of 80+ NLP tasks

Rob van Zoest
innerdoc.com

--

Periodic Table of Natural Language Processing Tasks by www.innerdoc.com and created with the Periodic Table Creator
Periodic Table of Natural Language Processing Tasks is created with the Periodic Table Creator

Russian chemist Dmitri Mendeleev published the first Periodic Table in 1869. Now it’s time for the NLP tasks to be organized in the Periodic Table style!

The variation and structure of NLP tasks is endless. Still, you can think about building NLP Pipelines based on standard NLP tasks and dividing them into groups. But what do these tasks entail?

More than 80 frequently used NLP tasks are explained!

Group 3: Word Parsing

14. Tokenization

Tokenization is the task of splitting raw text into smaller fundamental units; word tokens. This task is required for almost any NLP task. The goal is to build a vocabulary of word types (in spaCy: lexemes). A word type is a distinct word, in the abstract, rather than a specific instance. Word types are word tokens with no context. A word token is a word (string) observed in a piece of text.

How large your vocabulary should be, is a trade-off between memory limitations vs. coverage of word tokens. Each word token is converted to a unique id per word…

--

--

Rob van Zoest
innerdoc.com

Founder @ innerdoc.com | NLP Expert-Engineer-Enthusiast | Writes about how to get value from textual data | linkedin.com/in/robvanzoest/