Clinical trial Assistant — a RAG based approach on Snowflake leveraging Cortex capabilities -Part 1 of 2

with

This article is a sequel to the lifescience blog, in which we used a general purpose Llama2 7b for fine tuning and responding to questions on clinical trial protocols leveraging SPCS. We have now implemented the same use case with a RAG pattern leveraging new capabilities in Snowflake called Cortex AI (in private preview at the time of writing). Cortex AI offers a llama2–70b-chat LLM model as a service for completion, among other more specialized tasks (also includes extraction, translation, sentiment analysis). Snowflake Cortex also offers a native vector data type, as well as functions to generate vector embeddings and perform similarity search for constructing the RAG, all of which are in private preview. More details about Cortex here.
This is a
two part series with part 1 providing the high level overview and part 2 covering the specifics of implementation

Photo by Louis Reed on Unsplash

Lifesciences and Large Language Models

As a recap, Generative AI/ LLM in life-sciences has varied and differentiated needs that require understanding of embeddings that are highly contextualized like protein sequences, chemical structures and medical vocabularies. A high level use case summary can be seen in Figure 1 below.

Figure 1: Overview of lifescience use cases leveraging Gen AI

From these we picked Clinical trial protocol summarization as an example to demonstrate the art of possible for fine tuning and RAG for three reasons , namely:

  1. there is a need for a model to understand biomedical verbiage which meant fine tuning or adding a context was necessary
  2. there was public data with which we could build a solution thus making it more effective to demonstrate both the complexities and art of possible
  3. we can use a general purpose model like Llama2 and add context or fine tune it minimally , to provide accurate results instead of having to create or retrain a completely new foundational model. This would have been the case with more complex use cases like protein folding.

However, when we tried to demonstrate the example with fine tuning , as noted in the earlier blog, we had two observations:

  1. Fine tuning is not easy and may not be required for all tasks. It requires advanced data science skills and also requires an extensive GPU for computation.
  2. There is hallucination when we use the general purpose LLM model which makes attribution to the response difficult. The same question may not always yield the exact same response.

The above observations lended itself to testing this use case using a RAG approach rather than implicit fine tuning while still leveraging a Llama2 model that is life sciences aware to some extent.

Defining RAG and its role in clinical text summarization needs

RAG or retrieval augmented generation is a way of adding context to a model at run time. A very good example of RAG is in this link. This requires three key steps in general:

  1. Building a knowledge base of key terms and storing the associated embeddings in a vector database
  2. Extracting the prompts from a user defined question and performing a similarity match against the database
  3. Adding the retrieved results from the similarity as context to a LLM model for summarization dynamically

Our aim of this application was to leverage RAG to answer questions from clinical trial protocols of the nature seen below:

Snowflake and Cortex AI to build the RAG application for Clinical

To achieve the above pattern, we ingested data from clinical trials website to build a knowledge base that included key elements of the protocol required to answer the questions. Snowflake announced recent capabilities that are branded under the Cortex AI brand which allows hosted LLM, vector data types , and LLM functions including native embedding generators and similarity searches under the hood. For now, the Cortex makes available Llama 7b and Llama 70b for support and this will continue to grow in the future. You can also see more details in this blog, for understanding how Cortex works in general to create a RAG application. Please note all of these are private preview features at the time of writing this blog.

We leveraged Cortex features to build an industry specific RAG solution and Figure 2 below illustrates how we stitched/chained together this LLM application.

Figure 2: The three steps involved in building a clinical protocol enquiry application

The detailed break down for each of the steps is described in the subsequent article of the series

The Outcome

Clinical Protocol Assistant on Snowflake with Streamlit and Cortex

The outcome was pretty interesting as depicted in the above screenshots, especially when we compared it to the original fine tuned model.

The results for enquiry questions were comparable but the added advantage in this case was that it was also able to provide an attribution which helps make it grounded in reality. Our approach in making this solution tackle both SQL and Semantic type of questions makes it very easy to follow it up with a simpler query for providing details of a simple study id.

We restricted the top results for semantic to 1 for cost reasons , since this was an art of possible demonstration but in real life, it would be reasonable to provide the summarized view over at least top 5 responses. Figure 3 below provides a view of the key differentiators of the model and the unique approach LLM chaining provides is highlighted in the following section.

The differentiators

Figure 3: Differentiator snapshot of the RAG based clinical protocol assistant
  1. Adding the Problem Classification Step
    To develop a copilot which could answer an enterprise problem requires first to classify the problem being asked by the user and derive a few parameters allowing to classify the problem. In the current scenario, we distinguish between problems requiring retrieval augmented generation or not. For problems requiring retrieval augmented generation, we are able to determine further parameters that help in defining dynamically the SQL queries to run to retrieve the most specific corpus of data needed to answer the question.
  2. Supporting the ability for both Semantic search and SQL
    In order to provide more accurate results, this framework is able to distinguish between lookup queries, and semantic search type queries. For semantic search, it can further narrow down the scope of the semantic search with additional row filtering based on SQL predicates generated from the user’s question.
    In the current prototype, the SQL query is generated dynamically through Snowpark for Python code. However, once the Snowflake Cortex text2sql, announced during Snowday, becomes available, it will make this stage even simpler with a low-code approach for generating the appropriate SQL queries.
  3. Consistency and Accuracy of answers
    If asked the same question multiple times, the SQL based RAG approach described above generates consistently the same response, as the LLM is always handed the same content to base its response on.In addition, the LLM always provides an identifier tag, which is the study id, along with its answer, so the end user can either lookup further attributes of the study through the application, or get additional information from another source. This helps in limiting hallucinations, and if not quickly be able to determine if a hallucination has occurred.
    In the current framework, since it is a prototype, the answer is returned to the user directly. However, before returning the answer to the user, we could have added an additional step to further evaluate the response and tweak based on certain rules.

What’s next?

The current application sets a solid framework for a RAG application built in Snowflake for a number of problems in the domain of Life Sciences. With upcoming features announced by Snowflake such as text2sql and others, it will help improve this framework even more with a lesser code approach and extend the application capabilities in terms of type of questions it can address.

You could chain further models to enquire on associated patient data sets for example and see if there is any of interest you want access to as well.

The idea here is to be able to demonstrate the ability to quickly chain together native hosted Snowflake features to build an industry specific solution in life sciences quite easily.

Key Takeaways

As a summary, these are the key takeaways we would summarize when it comes to build a RAG based life science specific solution:

  1. The RAG approach with OSS Models such as Llama2, offers a compelling value to quickly assemble an LLM application to address a problem in the domain of Life Sciences without any advanced skills in data science, nor expensive GPU compute for fine-tuning.
  2. A strong data foundation is key. Data Preparation through Data Engineering for your RAG pattern is key. This helps during retrieval of the data during the retrieval process, but also allows to feed your LLM disambiguated data for inference to limit hallucinations and improve consistency and accuracy.
  3. The problem may require semantic search, SQL queries or a combination of both. Depending on the class of problems that your application needs to tackle, this can require a multi-step approach which could involve multiple inference points through LLM chaining to provide a suitable answer. It’s not always a semantic search only.
  4. Finally, fine tuning is relevant for sophisticated use cases like those in Drug Discovery. For that, we need to start with a domain specific model and then fine tune it on internal data sets. Combining fine tuning with RAG will enhance the model’s performance.

Snowflake provides all the building blocks to quickly assemble an LLM application, leveraging the platform native security and services. Options include bringing your own LLM models and fine tuning them further leveraging Snowflake SPCS, or leverage Snowflake Cortex AI LLM models exposed through SQL functions. The idea would be to choose the one that works the best for your use case.

Do not hesitate to contact us for more questions or for discussions on such use cases. Please also read the Part 2 for more details on the how.

--

--