NLP — Text PreProcessing — Named Entity Recognition(NER) — Part 5.1

Chandu Aki
The Deep Hub
Published in
5 min readFeb 21, 2024

Lets consider a scenario, Talent Acquisition Manager responsible for sorting through a multitude of resumes to identify potential candidates for a job opening. In the traditional manual approach, the manager meticulously reviews each resume, manually noting down relevant information such as candidate names, skills, and experience. This process is not only time-consuming but also prone to oversight.

In the manual scenario, the Talent Acquisition Manager spends hours scrutinizing resumes, highlighting names, skills, and experience. This arduous process may lead to potential oversight, making it challenging to identify the most suitable candidates efficiently.

With Named Entity Recognition(NER )— Resume Parsing: Enter Named Entity Recognition (NER), specifically applied to resume parsing. By leveraging NER, the Talent Acquisition Manager employs an automated system that swiftly scans through resumes, extracting crucial details like candidate names, contact information, skills, and work experience. What once demanded meticulous manual effort now happens in a fraction of the time, significantly improving the efficiency of talent acquisition.

So we understood that Named Entity Recognition(NER) will automate the process but, what exactly is NER ? 🤔

Named Entity Recognition (NER):

NER is an NLP task focused on identifying and classifying entities within text into predefined categories like persons, organizations, locations, dates, and more. It automates the extraction of specific information from unstructured text, providing a structured understanding of content.

source : nlpcloud.com

NLP is just a two-step process, below are the two steps that are involved:

  • Detecting the entities from the text
  • Classifying them into different categories

Use cases of Named Entity Recognition (NER):

The world is full of unstructured data, especially the web. Being able to extract structured information from it can give access to a lot of valuable information. Here are a couple of use cases:

  • Resume Screening
  • Information Extraction: Automatically extracting names, locations, and dates from documents
  • News Summarization: Assisting in summarizing news articles by extracting key information.
  • Financial Analysis: Extracting company names, financial figures, and relevant data from financial reports
  • Customer support. Analyzing customer queries becomes more efficient with NER.
  • Electronic Health Record (EHR) Entity Recognition

and more ………

How Named Entity Recognition (NER) works ?

Lets learn through an example

“Alan Turing is the father of Natural language processing. In his 1950 paper Computing Machinery and Intelligence, he described a test for an intelligent machine that could understand and respond to natural human conversation.”

Named Entity Recognition (NER) plays a crucial role in extracting meaningful information from text, as illustrated in the example sentence:

  1. Text Preprocessing: The process begins with text preprocessing. In the sentence, tasks like tokenization and part-of-speech tagging break down the raw text into manageable units and identify the grammatical roles of words.
  2. Entity Identification:NER algorithms then scan the preprocessed text to identify sequences of words representing entities. For instance, recognizing “Alan Turing” as a person.
  3. Entity Classification: After identification, NER classifies the recognized entities into predefined categories. In this case, “Alan Turing” is categorized as a person.
  4. Contextual Analysis: NER goes beyond mere recognition and considers the context surrounding entities. For example, understanding that in the sentence “Alan Turing is the father of Natural Language Processing,” “Alan Turing” is associated with the role of a person and not an organization or location.
  5. Disambiguation (if necessary): The sentence does not present significant ambiguity. However, in cases where entities may have multiple meanings, NER employs disambiguation techniques to ensure accurate classification.
  6. Named Entity Linking (Optional): Optionally, Named Entity Linking could associate recognized entities with external knowledge bases. This step enhances context by providing additional information about the entities, such as linking “Alan Turing” to his notable contributions.
  7. Output Generation: The final output includes the identified entities, their respective categories, and contextual information. For instance, output may consist of “Alan Turing” classified as a person, enriching the understanding of the sentence.
source : displaCy

You can tryout with your own examples at https://demos.explosion.ai/displacy-ent by choosing different models

Challenges

# Assumption 1 : Do we have any ambiguity in NER ? 🤔

Absolutely! Ambiguity in Named Entity Recognition (NER) is a common challenge due to the contextual nature of language. Let’s explore a scenario illustrating ambiguity:

Scenario: Imagine a news article discussing a breakthrough in cancer research:

Sentence 1: “John Hopkins (Organization) University leads groundbreaking cancer research.”

Sentence 2: “The renowned oncologist, Dr. John Hopkins (Person), shared insights on the latest cancer treatment.”

Ambiguity:

In Sentence 1, “John Hopkins” is referred to as an organization (the university leading the research). However, in Sentence 2, “John Hopkins” is mentioned as a person (the oncologist).

Explanation:

For a human reader, distinguishing between the organization and the person is straightforward based on the context. However, for a computer, this creates ambiguity. The name “John Hopkins” is used in different contexts, and without a deeper understanding of the broader narrative, a computer-based NER system might struggle to accurately categorize it.

# Assumption 2: Do we have any Other Challenges in NER ? 🤔

Named Entity Overlap:

Example: “The brand Apple (Organization) introduced a new apple (Food) flavor.”

The term “Apple” overlaps as both the name of an organization and a common word for a fruit, making it challenging for NER to distinguish between the two.

Variability in Entity Mentions:

Example: “Dr. Smith and Dr. J. Smith (Person) co-authored the research paper.”

Variability in how names are mentioned, including abbreviations, poses a challenge for consistent recognition of person entities.

Entity Synonyms:

Example: “The company acquired XYZ Corp (Organization) and XYZ Ltd (Organization).”

Synonyms like “Corp” and “Ltd” referring to the same entity (organization) can create difficulties in accurate recognition.

Contextual Ambiguity:

Example: “The Bank of England (Organization) is located in England (Location).”’

The context does not clearly distinguish between the organization (Bank of England) and the location (England), introducing ambiguity.

Multilingual Challenges:

Example: “The word ‘bâtiment’ (Building in French) is commonly used in architecture.”

Multilingual challenges arise when the same term has different meanings in various languages, like “bâtiment” meaning “building” in French.

Entity Co-reference:

Example: “Microsoft (Organization) released its latest product. It is a game-changer.”

Resolving co-references, where “it” refers to “Microsoft,” requires NER systems to understand the relationship between entities.

Rare or Unseen Entities:

Example: “XYZ Corp (Organization) is a startup that specializes in quantum computing.”

NER systems may struggle with entities like “XYZ Corp” if they are rare or not present in the training data.

The upcoming article will feature a hands-on practical exercise focused on Named Entity Recognition (NER).

--

--

Chandu Aki
The Deep Hub

Aspiring Data Scientist|Dynamic Data Analyst | Sales Analytics Expert | AI & ML , NLP , Generative AI Enthusiast