Building a Job Entity Recognizer Using Amazon Comprehend
Introduction
With the advent of Natural Language Processing (NLP), traditional job searches based on static keywords are becoming less desirable because of their inaccuracy and will eventually become obsolete. While the traditional search engine performs simple keyword searches, the NLP based search engine extract named entities, key phrases, sentiment, etc. to enrich the documents with metadata and perform search query based on the extracted metadata. In this tutorial, we will build a model to extract entities, such as skills, diploma and diploma major, from job descriptions using Named Entity Recognition (NER).
Entities Annotation:
In this tutorial we will use Amazon Comprehend custom entity recognizer to extract entities from job descriptions. There are two ways to train the model (see documentation):
- Entity List: Provide a list of words with their associated entity type
- Annotation: Provide the location of the word in the document and its entity type so Amazon Comprehend can train on both the entity and its context
Providing an entity list is usually the fastest way to train the model but this will result in lower accuracy. We decided to use the Annotation method to train the model to get the most accurate results. This step requires manual annotation of hundreds of documents which can be very time consuming. Choosing the right annotation tool is therefore of the…