Building Custom Relation Extraction (RE) Models — Part 1

4 min readMay 4, 2023

This aims to be a complete two part walk-through where we start with a dataset, iteratively annotate / label programmatically and finish up with an LSTM Relation Extraction (RE) Model.

This will be divided into two posts where this post will only go through the process of building out a custom labeled dataset and ultimately end up with a Rule-Based Relation Extraction Model. The second part will take the file created here and fine-tune a transformer model to classify the relationship.

Workflow / Process

Identity Named-Entities
Programmatically classify relationship
Inspect classifications
Save dataset
Repeat

This entire process will be managed through the command line using the extr-ds library (Github Repository).

pip install extr-ds

Entities

Before we can classify a relationship, we need entities. Labeling/Extracting entities was previously covered in another post. That prior process allowed us to build a decent Rule-Based Named-Entity Recognition (NER) model that we will leverage here.

Building Custom Named-Entity Recognition (NER) Models

Complete walk-through where we start with a dataset, quickly annotate our dataset programmatically and finish up with a…

medium.com

Define Relationships

r(e1, e2) == <label> where e1, e2 are entities.

extr-config.json

each instance we try to label could return many multiples back so it is recommended to keep the amount we observe per round low.

{
  ...,
  "split": {
    "amount": 5
  },
  ...,
}

labels.py

In the same file where we specified our entity patterns, we will aslo setup our relationships.

relation_defaults — This list of tuples specifies which e1 and e2 labels go together and what label to apply when both exist but a relationship was not determined. Only the relationships in this list will be labeled. It may make sense only having one active at a time by commenting out the ones you are not actively working on.

relation_defaults: List[Tuple[str, str, str]] = [
    ## (e1, e2, label)
    ('PERIOD', 'TIME', 'NO_RELATION'),
    ('TEAM', 'QUANTITY', 'NO_RELATION'),
]

relation_patterns — This list specifies the search patterns between e1, e2 and what to call that relationship if found. ie. r(‘PERIOD’, ‘TIME’) = ‘is_at’.

relation_patterns: List[RegExLabel] = [
    RegExRelationLabelBuilder('is_at') \
        .add_e2_to_e1(
            e2='TIME',
            relation_expressions=[
                r'(\s-\s)',
            ],
            e1='PERIOD'
        ) \
        .build(),
    RegExRelationLabelBuilder('is_spot_of_ball') \
        .add_e1_to_e2(
            e1='TEAM',
            relation_expressions=[
                r'\s+',
            ],
            e2='QUANTITY',
        ) \
        .build()
]

Quick and simple Named-Entity Recognition and Relation Extraction using Regular Expressions

A simple and straight to the point approach for those looking to see quick results while trying to do Named-Entity and…

medium.com

Classify Instances

Similar to building NER datasets, run the — split command to start. This will split, annotate and label a small subset of data. All output can be found in the /3 directory.

extr-ds --split

dev-rels.json — json dataset of annotations and labels. e1, e2 are annotated in the sentence to mark which entities we want to classify.

{
  "sentence": "(<e2:TIME>0:24</e2:TIME> - <e1:PERIOD>3rd</e1:PERIOD>) (No Huddle, Shotgun) PENALTY on ARZ - D.Williams, False Start, 5 yards, enforced at ARZ 30 - No Play.",
  "label": "is_at",
  "definition": "r(\"PERIOD\", \"TIME\")"
},

dev-rels.html — html page for a more natural way to inspect the outcomes.

Inspect Classifications

The easiest way to do this is to view the dev-rels.html file in Visual Studio Code / browser, similar to entities in the previous post.

During inspection, you will likely come across mislabeled examples (see above). In the case above, you notice that row #23 should be ‘is_at’ instead of ‘NO_RELATION’. To fix this, we can either update our rules in labels.py and run the — annotate command or we can update the label through the command line.

extr-ds --relate -label is_at=23,25

To ignore a row,

extr-ds --relate -delete 0,3,6

To undo the delete,

extr-ds --relate -recover 0,3,6

To reset after rule changes,

extr-ds --annotate -rels

Save Data

When everything looks fine,

extr-ds --save -rels

which will append what we just inspected to rels.json in the /4 directory. If the same instance comes in but is labeled differently, a message will log out and the instance will be ignored.

At this point, if you iteratively updated your labels.py file, you may have ended up with a pretty decent Rule-Based Relation Extraction model.

Quick and simple Named-Entity Recognition and Relation Extraction using Regular Expressions

A simple and straight to the point approach for those looking to see quick results while trying to do Named-Entity and…

medium.com

In the next post, we will go over fine-tuning a transformer model to classify the relationships between specific entities using the dataset we just built.

Building Custom Relation Extraction (RE) Models — Part 2

Fine-tune a transformer using the custom labeled dataset from Part 1 to classify relationships between two…

medium.com

Building Custom Relation Extraction (RE) Models — Part 1

Workflow / Process

Entities

Building Custom Named-Entity Recognition (NER) Models

Complete walk-through where we start with a dataset, quickly annotate our dataset programmatically and finish up with a…

Define Relationships

extr-config.json

labels.py

Quick and simple Named-Entity Recognition and Relation Extraction using Regular Expressions

A simple and straight to the point approach for those looking to see quick results while trying to do Named-Entity and…

Classify Instances

Inspect Classifications

Save Data

Quick and simple Named-Entity Recognition and Relation Extraction using Regular Expressions

A simple and straight to the point approach for those looking to see quick results while trying to do Named-Entity and…

Building Custom Relation Extraction (RE) Models — Part 2

Fine-tune a transformer using the custom labeled dataset from Part 1 to classify relationships between two…

Written by dp