Visualizing Relationships between Chemicals and Patent Data

Aniruddha Chatterjee
iReadRx
Published in
4 min readJul 6, 2021

Using graphs, we can uncover interesting relationships for the SME to find similar patents and maybe uncover clusters or groups of patents that can define more niche categories or topics. Typically the process will include:

  1. Read the patent and extract the chemical.
  2. Use the chemical to find matches from hundreds of other patents.
  3. Read the other patents to ensure if they match properly.

Imagine a scenario where you have a single patent and want similar patents having similar chemicals associated with them? Or find second-degree recommendations of patents from a single chemical?

Questions like these are tough to answer using a traditional database, as we already covered in the article Incompatibility of Conventional Database with Patent Search.

Here, we will see how exactly we can use Neo4j, a graph database, to answer our questions.

Data Ingestion

To see if we can use Neo4j effectively, we first need to find a source of data.

We looked towards the data that powers http://ichemist.ireadrx.ai. We use the NER (Named Entity Recognition) results from claims text and create relationships between the chemical compound name and the patent.

The ingestion process essentially created two nodes and one relationship between them. One important thing to note is that we want two or more nodes with the same chemical name to be merged into a single one; we do not want redundant nodes.

The result of the ingestion looks something like this.

The red nodes represent Patents; the khaki ones represent Chemicals.

Neo4j also gives us data in machine-understandable formats (such as JSON) to audit results and export to other systems for further processing and usage.

The same graph in the above picture is converted to JSON format.

We can now use the data to model our queries to respond to our questions.

Answering Patent Related Questions

Let us take a look at the possible scenarios we can respond to using the power of graphs.

Scenario 1 — Given a Patent ID, can we figure some similar patents based on the fact that the same chemicals may be present in other patents?

This scenario is exciting since users are expected to look for similar patents based on their current work or research. To simplify, let us say you are looking at a Stand Mixer on Amazon. Below the main product details, Amazon will also show you similar products as recommendations. The idea is to replicate a similar scenario for patients.

Match (p:Patent {id:’10633410' })<-[r:PRESENT_IN]-(c:Chemical)-[s:PRESENT_IN]->(n:Patent) RETURN DISTINCT p,r,c,s,n LIMIT 10000

To show you how it works, let us consider a Patent ID “10633410”. We want to find recommendations using just this patent. We extract the chemicals from this patent and use them to search for other patents that contain them. The result looks like the following screenshot below.

A single Patent was used to generate this big cluster.

Scenario 2—Given a chemical, can we figure out the patents containing that chemical and find second-degree recommendations from there?

With the power of graphs, we are not just limited to a single level of query. We can efficiently run second-degree queries as well. This can be used when we have just a single chemical and use it to recommend other patents.

Match (givenChem:Chemical {name:’benzene’})-[:PRESENT_IN]->(topPatent:Patent)
Match (p:Patent {id:topPatent.id })<-[r:PRESENT_IN]-(c:Chemical)-[s:PRESENT_IN]->(n:Patent) RETURN DISTINCT p,r,c,s,n LIMIT 10000

Let us take the example of “benzene” as the chemical name. We find the patents containing the chemical, then use those patents to search for the chemicals they contain. We then use those chemicals to get their containing patents, hence forming second-degree chains. We get a huge network like

A single chemical generated second-degree recommendation clusters
The same graph above is represented in JSON format.

The most interesting thing happens when we zoom into the graph and take a look at a narrower level. As we can see below, we started from “benzene”, and we ended up generating recommendation clusters of chemicals like “carbazole” and “cyclopentyl” which might of interest to the user. So the end result is a single chemical generating a cluster of interesting chemical chains.

Recommended chemical clusters formed from a single chemical.

Conclusion

These results can be represented both as a graph and in JSON for both humans and machines to understand. The clusters generated open up exciting arenas to go through patent data and discover relationships between patents and chemicals. It can also be extended to include diseases, patent holders, and pharmaceutical companies as well.

This service acts as a powerful recommendation engine that enables you to leverage the power of graphs to get your required results within seconds. The results can also be fed into a pipeline, opening up even more potential processes to be sped up by using this service.

--

--

Aniruddha Chatterjee
iReadRx
Writer for

Nocturnal animal that thrives on caffeine and loves software development.