…ic metrics above reveal some quick takeaways about each tool based on the specific extraction task. The NLTK Standard Chunker has perfect accuracy and recall but lacks in precision. It successfully extracted all the authors for the document, but also extracted 3 false entities. NLTK’s chunker would serve well in an entity extraction pipeline where the data scientist is concerned with identifying all possible entities
To solve this problem, we need to capture the semantic meaning of words, meaning we need to understand that words like ‘good’ and ‘positive’ are closer than ‘apricot’ and ‘continent.’ The tool we will use to help us capture meaning is called Word2Vec.