Enhancing Data Retrieval with Vector Databases and GPT-3.5 Reranking

11 min readFeb 1, 2024

The image conceptualizes the idea of a re-ranker in information retrieval, showcasing the process of filtering and highlighting the most relevant information from a vast array. It uses visual metaphors like arrows, filters, and spotlights on documents within a digital environment to represent the re-ranking process.

The integration of vector databases with GPT-3.5 for reranking signifies a transformative advancement in the realm of knowledge base management. This specialized approach marries the efficiency of vector databases in data retrieval with the nuanced understanding of GPT-3.5, creating a powerful tool for handling diverse and complex queries.

At its core, this methodology leverages vector databases for their ability to swiftly navigate through large datasets, identifying potentially relevant information based on the context and content of queries. While vector databases are adept at pinpointing information in a vast digital landscape, they may not always grasp the subtle intricacies of every query, especially those that are complex or ambiguous in nature. This is where GPT-3.5’s reranking capability becomes invaluable, as it fine-tunes the initial search results to ensure the most pertinent information is brought to the forefront.

The synergy between the rapid data processing of vector databases and the intelligent reranking by GPT-3.5 creates a more effective and efficient search process. This integrated approach not only elevates the accuracy of search results but also streamlines the retrieval process, catering to the specific needs of various queries and enhancing the overall user experience in navigating knowledge bases.

Throughout this article, we will delve into the mechanics of vector databases and how they are complemented by the reranking prowess of GPT-3.5. We will explore the practical applications of this approach, demonstrating its potential to revolutionize information retrieval in knowledge bases.

Vector Databases, Conventional Search, and GPT-3.5 Reranking

Vector Databases in Knowledge Base Queries

Vector databases play a crucial role in modern data retrieval systems, particularly in the context of knowledge bases. They work by transforming text data into vector form, which allows for efficient indexing and retrieval based on semantic similarity. This method is highly effective in quickly sifting through large datasets to find content that matches the query’s context.

Incorporating Conventional Search Methods

While vector databases are powerful, they are often complemented by conventional search methods. These traditional techniques, based on keyword matching and Boolean logic, provide a straightforward way to filter and retrieve data. When combined with vector databases, they offer a more rounded approach, ensuring that searches are both comprehensive and nuanced. This dual approach can handle a variety of queries, from simple keyword searches to complex inquiries requiring deep understanding.

GPT-3.5 Reranking: Refining Search Results

After the initial retrieval using vector databases and conventional search methods, GPT-3.5’s reranking capabilities come into play. GPT-3.5, a sophisticated language model, analyzes the retrieved data chunks and rearranges them in order of relevance to the query. This reranking is based on a deeper understanding of the content, context, and subtleties of the query, which vector databases alone might not fully grasp.

Enhanced Precision and Relevance

The combination of vector database retrieval with GPT-3.5 reranking ensures a higher level of precision and relevance in search results. Vector databases provide a broad sweep of the data, capturing a wide array of potential matches. GPT-3.5 then finely tunes these results, bringing the most pertinent information to the forefront. This process is particularly beneficial for complex queries, where understanding the context and nuances is key to providing accurate and useful information.

Broadening the Scope of Applications

The integration of these methods broadens the scope of applications for knowledge bases. From academic research to customer support databases, this approach can significantly enhance the efficiency and accuracy of information retrieval. It’s especially powerful in scenarios where the quality of information is critical, such as in technical support, medical inquiries, or legal research.

Case Study: Implementing an Advanced Retrieval System

Designing a GPT-4 Powered Retrieval Agent

In this case study, we explore the development of a retrieval system powered by GPT-4, designed to handle complex queries in a knowledge database. The system integrates vector database retrieval with GPT-3.5 reranking for enhanced accuracy.

For this case of study we will create an assistant, but that can also be done using standard ChatGPT function calling from it’s API. To give the GPT-4 assistant the ability to query data from the knowledge database we will give it a simple search function.

{
  "name": "search_on_knowledge_base",
  "description": "Retrieve and process data from the knowledge base using a combination of query, keywords, and a focus on recent content",
  "parameters": {
    "type": "object",
    "properties": {
      "question_with_context": {
        "type": "string",
        "description": "Combine the user question and the context in which the question was asked (including your system instructions) and create verbose version of the question containing the context within it. Make sure the question identifies all the details required for the search agent to retrieve good answers, like names, versions, and any other detail."
      },
      "keywords": {
        "type": "array",
        "items": {
          "type": "string"
        },
        "description": "Supplementary keywords to enhance the conventional search results. DO NOT use time relative terms like lastest, most recent, etc..."
      },
      "focus_on_recent": {
        "type": "boolean",
        "description": "A flag that, when enabled, prioritizes retrieval of the most recent content in response to the query"
      }
    },
    "required": [
      "question_with_context",
      "keywords",
      "focus_on_recent"
    ]
  }
}

Now, let’s give it a try. I gave it a fictitious system prompt:

You are an assistant tasked with answering user queries about a software product named LM Studio. Utilize your knowledge base to access recent and accurate information and employ it as needed to provide helpful responses.

The output was pretty good as we can see below. You can further tune the function descriptions to improve the output even further.

search_on_knowledge_base({
  "question_with_context":"What is the date of the latest release notes for LM Studio?",
  "keywords":["latest release","release notes","LM Studio"],
  "focus_on_recent":true
})

Now, the standard RAG part will take place. I will not cover it here, but usually, you will now create embeddings for the question (or enrich the query using other techniques like generating fake answers) and retrieve the results from your vector database based on vector similarity. You can also use the keywords to do a standard search and combine the results. Make sure to apply the focus_on_recent flag for penalizing old content when required.

Re-ranking with GPT-3.5

Let’s now focus on the re-ranking piece. I will demonstrate the re-ranking using a fictitious reply, consistent with what you get from a vector search with a query like “What is the date of the latest release notes for LM Studio?”. In this dataset I’m returning several different versions with different dates, and even different softwares that have names similar and higher versions to the one that is our question objective. Here we are working with 10 results, but that could be extended according to the context size of the model in use.

[
    {
        "id": 901256,
        "date": "2022-09-10",
        "description": "LM Studio Chrome Extension version 3.0.0, released on September 10, 2022, focuses on user feedback with the addition of a new feedback tool within the extension, allowing users to directly report issues or suggest improvements. This update also enhances the user interface for better accessibility and includes performance optimizations for a smoother editing experience. The LM Studio Chrome Extension 2.1.0 reaffirms our commitment to continuous improvement and user satisfaction."
    },
    {
        "id": 836406,
        "date": "2021-03-15",
        "description": "LM Studio version 1.0.0, released on March 15, 2021, marks the official launch of our comprehensive media editing suite. This initial version introduces a user-friendly interface, basic video and audio editing capabilities, and support for a wide range of file formats. Designed for both amateurs and professionals, LM Studio 1.0.0 aims to streamline the creative process with efficient workflow tools and a customizable workspace."
    },
    {
        "id": 728239,
        "date": "2021-08-15",
        "description": "LM Studio Chrome Extension version 2.0.0, released on Jul 15, 2021, adds new features including screen recording capabilities, a simplified interface for quicker navigation, and enhanced performance for faster editing and uploading. This update also introduces customizable keyboard shortcuts, making it easier for users to access their favorite tools. With version 1.1.0, the LM Studio Chrome Extension continues to improve user experience and functionality for digital content creators."
    },
    {
        "id": 969534,
        "date": "2022-01-10",
        "description": "LM Studio version 2.0.0, released on January 10, 2022, is a major update that revolutionizes the platform with AI-driven editing features, such as automatic video stabilization, scene detection, and smart cropping. The update also introduces a collaborative project feature, enabling teams to work together in real-time, regardless of location. LM Studio 2.0.0 sets a new standard for efficient and intuitive video editing software."
    },
    {
        "id": 572552,
        "date": "2022-06-05",
        "description": "LM Studio version 2.5.0, released on June 5, 2022, focuses on user experience improvements and bug fixes. Enhanced customization options for the workspace, improved audio waveform visualization, and the introduction of a quick export feature for social media platforms are key highlights. This version also sees the optimization of the software for newer hardware, ensuring faster and more reliable performance across all supported devices."
    },
    {
        "id": 570954,
        "date": "2022-11-30",
        "description": "LM Studio version 3.0.0, released on November 30, 2022, introduces groundbreaking 3D editing capabilities, VR support, and an integrated motion graphics editor. The update further expands the software's asset library with thousands of new royalty-free media elements. Additionally, LM Studio 3.0.0 enhances the software's machine learning algorithms for smarter auto-editing features, making it the most powerful and versatile version to date."
    },
    {
        "id": 613408,
        "date": "2021-12-20",
        "description": "LM Studio Chrome Extension version 3.0.0, released on December 20, 2021, brings advanced editing options such as image and video filters, text overlay, and audio mixing directly to your browser. Additionally, this update improves compatibility with various online platforms, ensuring smoother uploads and sharing. The LM Studio Chrome Extension 1.2.0 makes it easier than ever to create professional-looking content on the go."
    },
    {
        "id": 924822,
        "date": "2021-05-10",
        "description": "LM Studio Chrome Extension version 1.0.0, released on May 10, 2021, introduces a seamless integration of LM Studio's media editing tools directly into your browser. This initial release offers basic video and image capture functionalities, easy-to-use editing features, and direct upload options to popular social media platforms. Designed to enhance productivity and streamline content creation, the LM Studio Chrome Extension 1.0.0 is the perfect tool for creators looking to quickly edit and share content without leaving their browser."
    },
    {
        "id": 739496,
        "date": "2021-07-22",
        "description": "LM Studio version 1.2.0, released on July 22, 2021, brings significant enhancements including advanced color grading tools, improved rendering speeds, and expanded support for 4K video. This update also introduces a new library of visual effects and transitions, alongside the ability to import custom effects. With version 1.2.0, LM Studio enhances its editing precision and expands creative possibilities for users."
    },
    {
        "id": 699217,
        "date": "2022-04-30",
        "description": "LM Studio Chrome Extension version 4.0.0, released on April 30, 2023, marks a significant upgrade with the introduction of AI-powered editing features, including automatic content enhancement, smart cropping, and background noise reduction for videos. This version also introduces cloud storage integration, allowing users to save their projects online and access them from any device. LM Studio Chrome Extension 2.0.0 is designed to make advanced editing tools more accessible and to enhance the content creation workflow."
    }
]

The correct answer comes from id 570954:

LM Studio version 3.0.0, released on November 30, 2022, introduces groundbreaking 3D editing capabilities, VR support, and an integrated motion graphics editor. The update further expands the software’s asset library with thousands of new royalty-free media elements. Additionally, LM Studio 3.0.0 enhances the software’s machine learning algorithms for smarter auto-editing features, making it the most powerful and versatile version to date.

If we gave the top 3 items on this list back to the assistant, it will generate do generate and answer and give incorrect information to the user.

I’m using GPT-3.5-Turbo–1106 with temperature 0.2 and TopP equals 1 for this exercise. The prompt is the following:

You are a re-ranker assistant tasked with evaluating a set of content items in relation to a specific question. Your role involves critically analyzing each content item to determine its relevance to the question and re-ranking them accordingly. This process includes assigning a relevance score from 0 to 10 to each content item based on how well it answers the question, its coverage of the topic, and the reliability of its information.
To achieve your goal, use the following guidelines:
# Scoring Criteria Definition:
- Relevance to the Question: How directly does the content item address the user’s question?
- Completeness of the Answer: Does the content item provide comprehensive information that answers the user’s question?
- Reliability of the Information: Is the content item from a credible and trustworthy source, or does it provide accurate and verified information?
# Re-ranking and Output
List the content items in descending order of their relevance scores in the requested format. This re-ranked list should start with the content item that is most relevant to the question and end with the least relevant. Output only the list.
The list format is:
```
{ID},{5-words Rationale},{Relevance},{Completeness},{Reliability},{Total}
```
One item per line.

Note the added rationale, even 5 words rationale make a lot of difference in the output you get. Yes, LLMs are weird.

The user message is:

The question is: ```What is the date of the latest release notes for LM Studio?```
The contents are:
```json
{The contents}
```

After 10 runs and these are the results:

Results of re-ranking after 10 runs using GPT-3.5–turbo-1106, Temp: 0.2, TopP: 1

The re-rankers nailed it every single time, putting the right content in the first place. You can also try it on playground using this link https://platform.openai.com/playground/p/e6UHR3qBjQkhJJ32Or1zgRaX?model=gpt-3.5-turbo-1106&mode=chat

As we are using GPT-3.5 and also have a very short output (~20 tokens per content), the re-rank query runs fast and with a low cost.

As for today's cost and latency, the example above cost $0.002 per run, and the process took about 3 seconds. A very big re-rank with about 15K tokens input and 600 tokens output will cost about $0.017 and about 7 seconds to process. For sure both cost and latency will improve in the upcoming months. (Update: gpt-3.5-turbo-0125 just cut that value by a half)

Another important point is that sometimes you give back 10 chunks to the higher agent to process (hoping that the one you really want is there) and that’s also costly, with efficient re-rank you can reduce the amount of chunks retrieved, reducing the cost on GPT-4.

The final step is to give back the top results back to GPT and generate the final answer.

Conclusion

Integrating vector databases with GPT-3.5 for data retrieval offers a practical and efficient solution for knowledge base management. This method improves the accuracy and relevance of search results while making the process more straightforward for users.

Through the combination of vector database retrieval and GPT-3.5 reranking, we can better meet the demands for precise and timely information across various domains. This approach showcases the potential of combining existing technologies to enhance information retrieval, making it more adaptable to different types of queries and capable of prioritizing the most recent content.

As technology evolves, we anticipate further advancements that will continue to refine and improve our approaches to managing and accessing vast data repositories. The future of knowledge management looks promising, with ongoing innovations set to make information retrieval even more efficient and user-friendly.

This development marks a step forward in our quest to make knowledge more accessible and easier to navigate, promising a future where we can leverage information more effectively in our decision-making processes.