Build a Chatbot for Clinical Trials Across Multiple Data Sources
Unify the access to a knowledge graph, a vector database and the web in a LangChain chatbot
Clinical research plays a crucial role in advancing medical knowledge, improving patient care, and developing innovative treatments and therapies. It evaluates the safety and efficacy of medical interventions, including pharmaceuticals, medical devices, and behavioral interventions among human participants. Clinical research metadata is systematically uploaded to the widely recognized web portal, clinicaltrials.gov. This invaluable platform serves as a hub for medical professionals, enabling them to stay abreast of the latest advancements and developments within the pharmaceutical industry.
However, navigating and searching through clinicaltrials.gov can be challenging and cumbersome. In order to enhance user experience, I undertook a project called Clinical Trials as Graphs and Vectors, where I restructured data from clinicaltrials.gov and SNOMED into a Neo4j knowledge graph and a Qdrant vector database (Figure 1). Users can analyze the graph with Cypher and search the vector database semantically. With these two databases, users can quickly see who are the competitors, how they design the trials, and what the results are.
However, this duo-database setup can be improved. First, the data was limited to clinicaltrials.gov and SNOMED. So when users ask outside questions, the solution will fail. Second, each data source has its own interface. When users want to use the vector search, they have to input their natural language questions in the vector database API. Similarly, they need to type Cypher into Neo4j if they want to search the graph. That will become a problem as we add more and more data sources to the stack.
This article is to tackle these issues. The result is a chatbot under the inspiration of Tomaz Bratanic’s article Integrating Neo4j into the LangChain ecosystem. Powered by LangChain, it integrates the two data sources and adds web search into the mix. And it has the potential to include even more data sources such as SQL, NoSQL, and so on. Under the hood, the chatbot utilizes GPT-4. Faced with a natural language question, the chatbot has the ability to search for answers from…