Different ways of doing Relation Extraction from text

Andreas Herman

--

Relation Extraction (RE) is the task of extracting semantic relationships from text, which usually occur between two or more entities. These relations can be of different types. E.g “Paris is in France” states a “is in” relationship from Paris to France. This can be denoted using triples, (Paris, is in, France).

Information Extraction (IE) is the field of extracting structured information from natural language text. This field is used for various NLP tasks, such as creating Knowledge Graphs, Question-Answering System, Text Summarization, etc. Relation extraction is in itself a subfield of IE.

There are five different methods of doing Relation Extraction:

  1. Rule-based RE
  2. Weakly Supervised RE
  3. Supervised RE
  4. Distantly Supervised RE
  5. Unsupervised RE

We will go through all of them at a high level, and discuss some pros and cons which for each one.

Rule-based RE

Many instances of relations can be identified through hand-crafted patterns, looking for triples (X, α, Y) where X are entities and α are words in between. For the “Paris is in France” example, α=”is in”. This could be extracted with a regular expression.

Named entities in sentence
Part-of-speech tags in sentence

Only looking at keyword matches will also retrieve many false positive. We can mitigate this by filtering on named entities, only retrieving (CITY, is in, COUNTRY). We can also take into account the part-of-speech (POS) tags to remove additional false positive.

These are examples of doing word sequence patterns, because the rule specifies a pattern following the order of the text. Unfortunately these type of rules fall apart for longer-range patterns and sequences with greater variety. E.g. “Fred and Mary got married” cannot successfully be handled by a word sequence pattern.

Dependency paths in sentence

Instead, we can make use of dependency paths in the sentences, knowing which word is having a grammatical dependency on what other word. This can greatly increase the coverage of the rule without extra effort.

We can also transform the sentences before applying the rule. E.g. “The cake was baked by Harry” or “The cake which Harry baked” can be transformed into “Harry baked the cake”. Then we are changing the order to work with our “linear rule”, while also removing redundant modifying word in between.

Pros

  • Humans can create pattern which tend to have high precision
  • Can be tailored to specific domains

Cons

  • Human patterns are still often low-recall (too much variety in languages)
  • A lot of manual work to create all possible rules
  • Have to create rules for every relation type

Weakly Supervised RE

The idea here is to start out with a set of hand-crafted rules and automatically find new ones from the unlabeled text data, through and iterative process (bootstrapping). Alternatively, one can start out with a sed of seed tuples, describing entities with a specific relation. E.g. seed={(ORG:IBM, LOC:Armonk), (ORG:Microsoft, LOC:Redmond)} states entities having the relation “based in”.

Agichtein, Eugene, and Luis Gravano. “Snowball: Extracting relations from large plain-text collections.” Proceedings of the fifth ACM conference on Digital libraries. ACM, 2000.

Snowball is a fairly old example of an algorithm which does this:

  1. Start with a set of seed tuples (or extract a seed set from the unlabeled text with a few hand-crafted rules).
  2. Extract occurrences from the unlabeled text that matches the tuples and tag them with a NER (named entity recognizer).
  3. Create patterns for these occurrences, e.g. “ORG is based in LOC”.
  4. Generate new tuples from the text, e.g. (ORG:Intel, LOC: Santa Clara), and add to the seed set.
  5. Go step 2 or terminate and use the patterns that were created for further extraction

Pros

  • More relations can be discovered than for Rule-based RE (higher recall)
  • Less human effort required (does only require a high quality seed)

Cons

  • The set of patterns become more error prone with each iteration
  • Must be careful when generating new patterns through occurrences of tuples, e.g. “IBM shut down an office in Hursley” could easily be caught by mistake when generating of patterns for the “based in” relation
  • New relation types require new seeds (which have to be manually provided)

Supervised RE

A common way to do Supervised Relation Extraction is to train a stacked binary classifier (or a regular binary classifier) to determine if there is a specific relation between two entities. These classifiers take features about the text as input, thus requiring the text to be annotated by other NLP modules first. Typical features are: context words, part-of-speech tags, dependency path between entities, NER tags, tokens, proximity distance between words, etc.

We could train and extract by:

  1. Manually label the text data according to if a sentence is relevant or not for a specific relation type. E.g. for the “CEO” relation:
    “Apple CEO Steve Jobs said to Bill Gates.” is relevant
    “Bob, Pie Enthusiast, said to Bill Gates.” is not relevant
  2. Manually label the relevant sentences as positive/negative if they are expressing the relation. E.g. “Apple CEO Steve Jobs said to Bill Gates.”:
    (Steve Jobs, CEO, Apple) is positive
    (Bill Gates, CEO, Apple) is negative
  3. Learn a binary classifier to determine if the sentence is relevant for the relation type
  4. Learn a binary classifier on the relevant sentences to determine if the sentence expresses the relation or not
  5. Use the classifiers to detect relations in new text data.

Some choose to not train a “relevance classifier”, and instead let a single binary classifier determine both things in one go.

Pros

  • High quality supervision (ensuring that the relations that are extracted are relevant)
  • We have explicit negative examples

Cons

  • Expensive to label examples
  • Expensive/difficult to add new relations (need to train a new classifier)
  • Does not generalize well to new domains
  • Is only feasible for a small set of relation types

Distantly Supervised RE

We can combine the idea of using seed data, as for Weakly Supervised RE, with training a classifier, as for Supervised RE. However, instead of providing a set of seed tuples ourselves we can take it from an existing Knowledge Base (KB), such as Wikipedia, DBpedia, Wikidata, Freebase, Yago.

Distantly Supervised RE schema
  1. For each relation type we are interest in the KB
  2. For each tuple of this relation in the KB
  3. Select sentences from our unlabeled text data that match these tuples (both words of the tuple is cooccurring in the sentence), and assume that these sentences are positive examples for this relation type
  4. Extract features from these sentences (e.g. POS, context words, etc)
  5. Train a supervised classifier on this

Pros

  • Less manual effort
  • Can scale to use large amount of labeled data and many relations
  • No iterations required (compared to Weakly Supervised RE)

Cons

  • Noisy annotation of training corpus (sentences that have both words in the tuple may actually not describe the relation)
  • There are no explicit negative examples (this can be tackled by matching unrelated entities)
  • Is restricted to the Knowledge Base
  • May require careful tuning to the task

Unsupervised RE

Here we extract relations from text without having to label any training data, provide a set of seed tuples or having to write rules to capture different types of relations in the text. Instead we rely on a set of very general constraints and heuristics. It could be argued if this is truly unsupervised, since we are using “rules” which are at a more general level. Also, for some cases even leveraging small sets of labeled text data to design and tweak the systems. Never the less, these systems tend to require less supervision in general. Open Information Extraction (Open IE) generally refers to this paradigm.

TextRunner algorithm. Bach, Nguyen, and Sameer Badaskar. “A review of relation extraction.” Literature review for Language and Statistics II 2 (2007).

TextRunner is an algorithm which belongs to these kinds of RE solutions. Its algorithm can be described as:

1. Train a self-supervised classifier on a small corpus

  • For each parsed sentence, find all pairs of noun phrases (X, Y) with a sequence of words r connecting them. Label them as positive examples if they meet all of the constraints, otherwise label them as negative examples.
  • Map each triple (X, r, Y) to a feature vector representation (e.g. incorporating POS tags, number of stop words in r, NER tag, etc.)
  • Train a binary classifier to identify trustworthy candidates

2. Pass over the entire corpus and extract possible relations

  • Fetch potential relations from the corpus
  • Keep/discard candidates according to if the classifier considers them as trustworthy or not

3. Rank-based assessment of relations based on text redundancy

  • Normalize (omit non-essential modifiers) and merge relations that are same
  • Count the number of distinct sentences the relations are present in and assign probabilities to each relation

OpenIE 5.0 and Stanford OpenIE are two open-source systems that does this. They are more modern than TextRunner (which was just used here to demonstrate the paradigm). We can expect a lot of different relationship types as output from systems like these (since we do not specify what kind of relations we are interested in).

Pros

  • No/almost none labeled training data required
  • Does not require us to manually pre-specify each relation of interest, instead it considers all possible relation types

Cons

  • Performance of the system depends a lot on how well constructed the constraints and heuristics are
  • Relations are not as normalized as pre-specified relation types

Ending

I hope you enjoyed this reading! Please comment if there is anything you think should be in this article, what RE method you prefer and why, or anything you want to share regarding this topic. Cheers!

References

Relation Extraction, Huck and Fraser 2016, (presentation)

Rule Extraction: Rule-based Approaches, Grishman 2017 (presentation)

Using Patterns to Extraction Relations (youtube)

Extracting Information from Text, NLTK 2001 (online book)

Snowball: Extracting Relations from Large Plain-Text Collections (presentation)

Semi-Supervised and Unsupervised Relation Extraction (youtube)

Mining Knowledge Graphs from Text (tutorial)

Relation Extraction GATE

Banko, Michele, et al. “Open information extraction from the web.” IJCAI. Vol. 7. 2007.

Fader, Anthony, Stephen Soderland, and Oren Etzioni. “Identifying relations for open information extraction.” Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2011.

--

--

Andreas Herman

Senior Data Scientist, Mobility Accelerator Data Science Lead at Hitachi