Incompatibility of Conventional Database with Patent Search

Aniruddha Chatterjee
iReadRx
Published in
5 min readJun 21, 2021

Product Description

http://ichemist.ai is a search engine we developed to help researchers find chemistry information that appears in patents.

We need a database to help us to manage all this huge load of database. Here, we are comparing various data structures and going on why we choose a graph database for our use case.

Different database structures

  • RDBMS
  • NoSQL
    Graph Databases

Advantages of using RDBMS

1. Faster Query Processing — Large amount of
data is usually retrieved quickly and efficiently. Operations like Insertion,
deletion, manipulation of data is also done in almost no time.
2. No Coding Skills — For data retrieval, large number of lines of code is not required. All
basic keywords such as SELECT, INSERT INTO, UPDATE, etc are used and
also the syntactical rules are not complex in SQL, which makes it a
user-friendly language.
3. Standardized Language — Due to documentation and long establishment over years, it provides a uniform platform worldwide to all its users.
4. Portable — It can be used in programs in PCs, server, laptops independent of any
platform (Operating System, etc). Also, it can be embedded with other
applications as per need/requirement/use.
5. Interactive Language — Easy to learn and understand, answers to complex queries can be received in seconds.
6. Large community support — SQL is a very common database solution worldwide due to it being present since a long time and the above reasons is widely accepted and used and hence the community support is great

Advantages of using NoSQL

  1. Support for unstructured data — Compared to SQL, where database schema is not flexible, NoSQL allows us a flexible database schema and allows us to change fields and the structure over item depending on changing target audience, implementation and use case
    2. Easy to update — Since NoSQL allows us to change database schema easily hence it becomes very easy to update depending upon other external changes. Such ease in change also allows us to query the previous set of data along with the new set of data with minimal; changes in the query as previous indexes are still valid and can be used to test and deploy with changes easily and quickly.
    3. Relationships — Relations or the lack thereof make NoSQL easy to understand and get started with quickly without us dedicating hours into understanding the complex relations of a normal SQL database to gain a proper understanding how the queries work. Also, most NoSQL solutions come with their own querying libraries that can interpret the data being stored without relying on the relation between each data entity.
    4. Allows us to scale inexpensively — When provisioning database servers, we might start of with a single server but as the usage and traffic increases, the server will one day be insufficient to handle the incoming data and here comes the concept of scaling to accommodate all users without compromising on quality. NoSQL databases allow us to scale horizontally, ie. as the traffic increases we can shard the database to other inexpensive servers and use them as a single database. In case of a SQL we would have to scale it vertically which means provisioning more expensive hardware making it less sustainable
    5. Developer friendly structure — The above reasons contribute to NoSQL being a more developer friendly solution leading to ease of work and greater efficiency in the end.

Graph DB — What is it?

Graph DB is another NoSQL solution but unlike popular NoSQL solutions like MongoDB, where the data is stored in a tree like structure, Graph DBs store data as graphs and hence can form cycles in them, ie. we can cylce back to root node and have connections between all the nodes and thus forming better relations between different nodes. This allows us to enjoy the best of both worlds and hence is the main reason why we chose Neo4J as our database of choice for iChemist.

Using Neo4J we get the easy and fast of queries using relations between all nodes and also have the flexibility of changing and expanding depending upon the use.

Advantages of using Graph DB

- Faster queries since all nodes in a graph DB have [index-free adjacency property](https://dmccreary.medium.com/how-to-explain-index-free-adjacency-to-your-manager-1a8e68ec664a).
- Object oriented thinking enables us to better understand the data without having to have a complete idea off all existing relations in a table.
- Real time update and easy support for simultaneous queries
- Easily update schema
- The most important point of all is the support for easy recursive queries. Graph DB’s allow us to find direct and indirect relations between two nodes, something which in PostgreSQL would have required multiple queries comparing different columns separately. A good representation of it being here -

A query in a RDMS

A recursive query using any RDBMS solution

The same query in a graph database, Neo4J in this instance would look like this -

The same query in a graph database

The above use case demonstrates a situation where, we need to look up the departments that Alice is part of in an organization, in a SQL implementation, we would first look into the employee's table and then into the department table to get Alice’s departments by querying by employee id and once that is done we can query the department ids we got from the department table to get the department name and details.

Comparing the same use case implementation in Neo4J, we can just query via the relation and get the department results as all the nodes are directly connected in Neo4J unlike in SQL, and hence get faster and simpler queries. The problem magnifies when we take larger queries and it is in such a situation that this feature of Graph DBs comes to light the most.

Conclusion

Taking into consideration all the above points and having analyzed all solutions we thus came to the conclusion, that the use of Neo4J as our proffered database choice was most appropriate for our product, as it allows for the most efficient and fast queries for our chemical patent search engine and hence we decided to implement the idea with it and the fact that it has a great community support was just the icing on the cake. Keep a lookout at all our future blogs and get to know how we got about implementing Neo4J and follow us on our journey to build iChemist.

--

--

Aniruddha Chatterjee
iReadRx
Writer for

Nocturnal animal that thrives on caffeine and loves software development.