Building a Knowledge Graph Using BERT Based NER and Neo4j, Then Predict Unknown Links

Step 1. Named entity and relation extraction from literatures

Named entity extraction is one of the successful applications of BERT in NLP. After fine tuning BERT using biomedical literatures, the derived BioBERT model has about 90% precision on biomedical named entity recognition. On top of BioBERT, Kim et al has developed a tool called BERN that can extract biomedical named entities and identify the types of the entities, i.e. classify the entities into genes/proteins, diseases, drugs, etc.

Step 2. Build the knowledge graph in Neo4j

With the entity relation dataset we obtained from the previous step, we can quickly build a knowledge graph using the following Cypher commands.

Step 3. Predict Unknown Relations between Entities

With knowledge graph in place, we can then predict unknown relations between entities. There are many algorithms to predict unknown connections. We will use the Adamic Adar method as an example. The basic idea behind the algorithm is the probabilities of two pairs of entities (A and C, A and E) have direct connections depending on their common neighbours (B, D). The more friends their common neighbour has, the lower probability that the common neighbour will introduce them to know each other.



Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store