Did You Know Your Virtual Assistant Has a Name Expert? SpaCy’s NER Explained

Prakash Ramu
YavarTechWorks
Published in
5 min readJan 29, 2024

Hey tech enthusiasts!

Have you ever wondered how your virtual assistant knows exactly what names, locations, or specific details you’re talking about in your requests? It’s not by chance; it’s a smart move called Named Entity Recognition.

Imagine your assistant as a superhero, with names as its special tool for tackling tasks. A regular day with our virtual buddy. When you tell it to find movies with Director Christopher Nolan, it’s not just hearing words — it’s picking out name, Christopher Nolan, like a pro. And that’s when the fun begins!

Curious to know about the latest movies with Cristopher Nolan? No problem! Your assistant, armed with name extraction, quickly finds the details you’re curious about. It’s like having a friend who knows your favorite actors and helps you discover new movies effortlessly.

Understanding Named Entity Recognition (NER)

Named Entity Recognition is a fascinating aspect of natural language processing (NLP) where machines, like your virtual assistant, learn to identify and categorize entities within text. Entities can be anything from names of people (like Director Christopher nolan) to locations, organizations, dates, and more. One of the leading tools for NER is SpaCy, a popular open-source library for advanced natural language processing in Python.

SpaCy NER

SpaCy’s Named Entity Recognition (NER) is like a language wizard that magically identifies and categorizes different types of information in text. It can effortlessly pick out individual names (PERSON), recognize organizations (ORG) as if it has a VIP list, and locate geopolitical entities (GPE) with the precision of a world explorer. When it comes to temporal expressions, it’s a time-traveler, skillfully identifying dates (DATE) and times (TIME). Managing money matters is a breeze for SpaCy, as it navigates through monetary values (MONEY) like a financial guru. It’s also a mathematical whiz, catching percentage expressions (PERCENT) and numerical values (CARDINAL) with ease. Whether dealing with measurements (QUANTITY) or ordinal numbers (ORDINAL), SpaCy’s NER is the go-to expert, making text analysis a seamless and enriching experience.

SpaCy Essentials: Easy Steps to Install and Use for Natural Language Processing

Step 1: Open your terminal or command prompt and type pip install spacy to install SpaCy.

Step 2: Choose a language model that fits your needs. For instance, you can get the English model by typing python -m spacy download en_core_web_sm.

SpaCy has different models:

  • en_core_web_sm (Small): Quick and lightweight, great for projects with limited resources.
  • en_core_web_md (Medium): A balance of size and performance, suitable for projects needing moderate linguistic detail.
  • en_core_web_lg (Large): The most detailed and comprehensive model, ideal for tasks requiring a deep understanding of language.

Step 3: In your Python script or Jupyter notebook, import SpaCy and load the language model:

import spacy
# Load the English language model
nlp = spacy.load("en_core_web_sm")

Step 4: Process your text using SpaCy. Here’s an example:

text = "Find movies with director cristoper nolan"
# Process the text with SpaCy
doc = nlp(text)
# Extract named entities
entities = [(ent.text, ent.label_) for ent in doc.ents]
# Print the extracted entities
print("Extracted Entities:", entities)

This code takes a simple text string, processes it with SpaCy, and prints the recognized entities along with their labels. The output will look like this:

Extracted Entities: [('cristoper nolan', 'PERSON')]

SpaCy Visualization

If you’re curious about turning text into an eye-catching visual display, SpaCy has got your back. In this short guide, we’ll explore how to use SpaCy’s visualization tools to bring your text to life.

Visualization provides a quick and intuitive way to understand which entities are present in your text. It’s a visual summary that can be more accessible than reading through the text or looking at raw data. Imagine you have the following text:

text = "Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976. The company is known for its innovative products like the iPhone."

Now, let’s utilize SpaCy to analyze the text and create a visual display of the recognized named entities. The crucial step involves incorporating displacy, with the rest of the code building upon what we've already covered in the earlier example.

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)

# Visualize the named entities
displacy.serve(doc, style="ent")

This line uses displacy.serve() to create a visualization of the named entities in the processed text. The doc object, containing the linguistic analysis, is passed to this function. The style="ent" parameter specifies the visualization style for named entities.

Running this code will open a new browser window or tab with an interactive visualization displaying the text along with color-coded highlights around identified named entities, such as persons, organizations, dates, etc.

In the visualization, you’ll see color-coded highlights around specific words or phrases. For instance:

  • “Apple Inc.” might be highlighted as an ORGANIZATION.
  • “Steve Jobs,” “Steve Wozniak,” and “Ronald Wayne” might be highlighted as PERSONs.
  • “April 1976” might be highlighted as a DATE.
  • “iPhone” might be highlighted as a PRODUCT.

This visual summary gives you an immediate grasp of the key entities in the text without having to read through it meticulously. It’s like a snapshot that captures the essence of the information, making it more accessible and user-friendly, especially when dealing with larger datasets or complex texts.

Syntactic Dependency Visualization: Illuminating Linguistic Connections

At its core, syntactic dependency visualization in SpaCy provides a graphical representation of the relationships between words in a sentence:

  • It highlights how each word is connected to others, indicating dependencies such as subject, object, modifier, and more.
  • By visualizing these dependencies, SpaCy offers a clearer understanding of the grammatical structure and flow of information within the text.
import spacy
from spacy import displacy

# # Load the English language model
nlp = spacy.load("en_core_web_sm")

text = "Find movies with director cristoper nolan"
# Process the text with SpaCy
doc = nlp(text)
# Extract named entities
entities = [(ent.text, ent.label_) for ent in doc.ents]

# Visualize syntactic dependencies
displacy.serve(doc, style="dep",options={'bg': '#808080'})

The visualization displays the syntactic structure of the sentence, highlighting relationships between words. It provides a clear depiction of dependencies such as subjects, objects, and modifiers. The graphical representation enhances understanding of the sentence’s grammatical composition. This visual insight aids in extracting meaningful linguistic patterns and insights from the text.

If there are any comments or feedback, please feel free to share. Your input is valued and appreciated. Thank you!

--

--