NLP: Entity Grounding

4 min readMay 2, 2019

When you extract entities (names, dates, places, etc) or any keyword, from a document, there are external details surrounding it which provide us with a contextual awareness that a machine often lacks. (e.g. knowing that the date 25th December is Christmas, or that a Saturday is a weekend, or that the actor Keanu Reeves is male, etc.) If this implicit knowledge is made explicit to a machine, it can significantly aid its ability to learn and identify patterns in the data.

Pandas

Since dates and times can be phrased in various ways, lets start by normalising them (i.e. into a standard format like a datetime stamp)

example_date = "dec 25th"
from dateutil import parser
datetime = parser.parse(example_date)
datetime

Then we can use Pandas to expand the implicit features hidden within the datetime stamp (e.g. day, month, weekend, etc)

stndrd,th = ["st","nd","rd"], ["th"]import pandas as pd 
dt = pd.to_datetime(datetime)
date_time = {
  "phrase":example_date,
  "date": f"{dt.day}/{dt.month}/{dt.year}",
  "time": f"{dt.hour}:{dt.minute}:{dt.second}",
  "month":("January","February","March","April","May","June","July","August","September","October","November","December")[dt.month -1],
  "day":("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")[dt.dayofweek],
  "day of year":dt.dayofyear,
  "suffix":f"{dt.day}{(stndrd + th*17 + stndrd + th * 7 + stndrd)[dt.month - 1]}",
  "is month end":dt.is_month_end,
  "is month start":dt.is_month_start,
  "is quarter end":dt.is_quarter_end,
  "is quarter start":dt.is_quarter_start,
  "is year start":dt.is_year_start,
  "is year end":dt.is_year_end,
  "is weekend":dt.dayofweek in (6,7),
}date_time

various features extracted from the datetime stamp

This is also an important step to perform before embedding dates or times!

Embedding dates and times via their extracted features

Google Correlate

There are also ways to extract events commonly associated with times and dates (i.e. holidays). Google correlate uses an algorithm to find correlated search terms based on the pattern of the searches over the year. (e.g. people will usually search for “Christmas” and related events around 25th December)

Unfortunately it doesn’t provide an API (which means the correlated results must be scraped). Let’s see which search terms correlate to the term “25th December”

search terms correlated to the phrase “25th December”

WordNet

WordNet is a curated graph of words and their relation with one another

It’s free and easily accessible via Python’s NLTK library. Its an easy and rather comprehensive way to find a word’s synonyms automatically

import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wndef synonyms(word):
  synonyms = set()
  for synset in wn.synsets('_'.join(word.lower().split())):
      synonyms |= set(synset.lemma_names())
      for related_synset in synset.hypernyms() \
                           + synset.hyponyms() \
                           + synset.part_meronyms() \
                           + synset.substance_meronyms() \
                           + synset.member_meronyms() \
                           + synset.part_holonyms() \
                           + synset.substance_holonyms() \
                           + synset.member_holonyms() \
                           + synset.topic_domains() \
                           + synset.region_domains() \
                           + synset.usage_domains() \
                           + synset.entailments() \
                           + synset.causes() \
                           + synset.also_sees() \
                           + synset.verb_groups() \
                           + synset.similar_tos():
        synonyms |= set(related_synset.lemma_names())
  return synonyms

There are many different types of relations in WordNet (**hypernyms**, **hyponyms**, **entailment**, **holonyms**, **meronyms**, etc) so don’t forget to search them all!

synonyms("psoriasis")

{‘acanthosis’, ‘disease_of_the_skin’, ‘psoriasis’, ‘skin_disease’, ‘skin_disorder’}

synonyms("robotics")

{‘AI’, ‘animatronics’, ‘artificial_intelligence’, ‘robotics’, ‘telerobotics’}

synonyms("thanks")

{‘acknowledge’, ‘acknowledgement’, ‘acknowledgment’, ‘aid’, ‘appreciation’, ‘assist’, ‘assistance’, ‘bow’, ‘convey’, ‘curtain_call’, ‘give_thanks’, ‘help’, ‘recognise’, ‘recognize’, ‘thank’, ‘thank_you’, ‘thanks’}

DBPedia

Using DBPedia, we can find facts related to names and places.

Just like WordNet, there are various types of relations (e.g. gender, deathPlace, children, etc)

gender relation

children relation

deathPlace relation

Wikipedia

We could even use the links that direct to a Wikipedia page as implicit relations for an entity. The disadvantage of this approach is that the type of relation will not be named.

https://colab.research.google.com/github/mohammedterry/NLP_for_ML/blob/master/Synonyms_Grounding.ipynb