5 NLP Tasks for Word Parsing
Get a grip on the Natural Language Processing landscape! Start your NLP journey with this Periodic Table of 80+ NLP tasks
Russian chemist Dmitri Mendeleev published the first Periodic Table in 1869. Now it’s time for the NLP tasks to be organized in the Periodic Table style!
The variation and structure of NLP tasks is endless. Still, you can think about building NLP Pipelines based on standard NLP tasks and dividing them into groups. But what do these tasks entail?
More than 80 frequently used NLP tasks are explained!
Group 3: Word Parsing
14. Tokenization
Tokenization is the task of splitting raw text into smaller fundamental units; word tokens. This task is required for almost any NLP task. The goal is to build a vocabulary of word types (in spaCy: lexemes). A word type is a distinct word, in the abstract, rather than a specific instance. Word types are word tokens with no context. A word token is a word (string) observed in a piece of text.
How large your vocabulary should be, is a trade-off between memory limitations vs. coverage of word tokens. Each word token is converted to a unique id per word…