Use Crosslingual Coreference, spaCy, Hugging face and GPT-3 to Extract Relationships from Long Texts

Accelerate knowledge graph construction with an all-star NLP pipeline

Sixing Huang
Geek Culture

--

Photo by Dmitry Ratushny on Unsplash

In my previous article Relationship Extraction with GPT-3, I have demonstrated how we can use the powerful GPT-3 to extract subject-verb-object relationships such as gene regulations and metabolic capacities from research article excerpts. GPT-3 not only correctly recognizes the name entities, but it also does the necessary noun-verb conversions and entity expansions to format the results. For example, “downregulation of A by B” can be correctly transformed into B,downregulate,A. Or “A and B are not utilized” can be correctly turned into {"A utilization": "negative", "B utilization": "negative"}. These abilities make GPT-3 a very valuable tool in our biomedical NLP toolkit.

However, this solution is not viable for full-length articles, because GPT-3’s prompt has a length limitation of about 1500 words. And it is expensive, too. So we’d better submit just the very sentences, where the relationships are, to GPT-3. It means that we need a pipeline that consists of four components that preprocesses, splits, filters and submits the sentences (Figure 1). The pipeline will generate subject-verb-object triplets in the right formats ready for…

--

--

Sixing Huang
Geek Culture

A Neo4j Ninja, German bioinformatician in Gemini Data. I like to try things: Cloud, ML, satellite imagery, Japanese, plants, and travel the world.