Real time RAG with Tools4AI, Google and OpenAI in Java

3 min readApr 14, 2024

RAG

Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating real-time, external data retrieval into the response generation process. RAG combines the generative power of pre-trained language models with the ability to fetch external data in real-time to produce contextually relevant and up-to-date responses. This method addresses the limitations of static training datasets that LLMs typically rely on by allowing the model to access fresh information at the time of query processing.

How RAG Works

Retrieve: When a query is received, the RAG system first identifies the need for external data to enhance the response. It then performs a real-time search to retrieve relevant information from databases, the internet, or other data sources. This step is crucial for ensuring the data integrated into the response is timely and pertinent.
Augment: The retrieved data is then used as an augmented context for the LLM. This means that the language model not only utilizes its pre-trained knowledge but also the newly fetched data to formulate its responses.
Generate: Finally, the LLM generates a response based on both its internal knowledge base and the external data provided. This response is typically more accurate, detailed, and contextually appropriate than what would be possible using the LLM alone

Real Time Rag

Real Time Retrieval-Augmented Generation (RAG) is a sophisticated technique that combines the power of language models with real-time data retrieval to provide contextually rich and up-to-date responses. This method enhances the response capability of AI-driven systems by integrating external information sources dynamically during the response generation process.

How Real Time RAG Works

Retrieve: The process begins when a user submits a query. The system quickly searches through external databases or the internet to retrieve relevant information. This step is crucial as it gathers the most current data needed to answer the query.
Augment: The retrieved data is then augmented with the pre-existing knowledge of a language model. This means the system doesn’t just rely on its trained data but also uses new information to formulate a response.
Generate: Leveraging the combined data, the language model generates a response that is not only informed by its vast training but also by the most relevant and recent information available. This response is typically more accurate and detailed than one generated from static model knowledge alone.

Real Time RAG with Tools4AI

To integrate real-time Google search capabilities within an AI application using Tools4AI, you can leverage the @Predict annotation along with the JavaMethodAction interface to define actions that the AI should take based on natural language prompts. Here’s how you can set up a class to perform a Google search using an API, as you described in your code snippet:

Steps to Implement Real-Time Google Search with Tools4AI

Define the Action Class:

Use the @Predict annotation to specify the action's purpose and trigger phrases. This makes the action easily recognizable by Tools4AI when relevant prompts are received.

@Predict(actionName = "googleSearch", description = "search the web for information")
public class GoogleSearchAction implements JavaMethodAction {
    public String googleSearch(String searchString, boolean isNews) {
        // Implementation details here
    }
}

Get the serperKey from here https://serper.dev/ and configure it in tools4ai.properties or pass as a system parameter

We will use Unirest to perform the Http call

public String googleSearch(String searchString, boolean isNews) {
    if (PredictionLoader.getInstance().getSerperKey() == null) {
        return "Default response if no API key is available";
    }
    log.info(searchString + " : " + isNews);
    HttpResponse<String> response = Unirest.post("https://google.serper.dev/search")
            .header("X-API-KEY", PredictionLoader.getInstance().getSerperKey())
            .header("Content-Type", "application/json")
            .body("{\"q\":\"" + searchString + "\"}")
            .asString();
    return response.getBody();
}

The result from this action will be available in the action processor

OpenAIActionProcessor processor = new OpenAIActionProcessor();
String realTimeResult = processor.processSingleAction("find me info about 
indian food on internet");
//process the result
processor.query(" here is the information about indian food, tell me the calories");

Full code and example is here