Multimodal Retrieval Augmented Generation

An implementation with Azure AI Search and GPT-4-turbo

Published in

Microsoft Azure

7 min readJan 28, 2024

Generative AI systems have demonstrated stunning capabilities in producing new content, given users’ questions in natural language. However, generating high-quality and relevant content is not an easy task, especially when it requires factual or domain-specific knowledge. How can we ensure that the generative models produce accurate and useful outputs for various tasks and contexts?

One promising approach is retrieval augmented generation (RAG), which combines the power of large language models (LLMs) with the information retrieval capabilities of search engines.

Note: in the retrieval stage, we can leverage different search methods depending on our application. Traditional search engines might leverage keyword or semantic search; more recent approaches would rather rely on vector search, which leverages text embeddings (both for the knowledge base and user’s query) to compute text similarities and retrieve the most similar context, given a user query; finally, there are methods that combine both semantic and vector search (hybrid methods) to achieve better performance, as proposed by the Azure AI Search service.

RAG is a technique that allows the generative models to access external knowledge sources, such…

Multimodal Retrieval Augmented Generation

An implementation with Azure AI Search and GPT-4-turbo

Written by Valentina Alto