Use Crosslingual Coreference, spaCy, Hugging face and GPT-3 to Extract Relationships from Long Texts
Accelerate knowledge graph construction with an all-star NLP pipeline
In my previous article Relationship Extraction with GPT-3, I have demonstrated how we can use the powerful GPT-3 to extract subject-verb-object relationships such as gene regulations and metabolic capacities from research article excerpts. GPT-3 not only correctly recognizes the name entities, but it also does the necessary noun-verb conversions and entity expansions to format the results. For example, “downregulation of A by B” can be correctly transformed into B,downregulate,A
. Or “A and B are not utilized” can be correctly turned into {"A utilization": "negative", "B utilization": "negative"}
. These abilities make GPT-3 a very valuable tool in our biomedical NLP toolkit.
However, this solution is not viable for full-length articles, because GPT-3’s prompt has a length limitation of about 1500 words. And it is expensive, too. So we’d better submit just the very sentences, where the relationships are, to GPT-3. It means that we need a pipeline that consists of four components that preprocesses, splits, filters and submits the sentences (Figure 1). The pipeline will generate subject-verb-object triplets in the right formats ready for…