Strategically Grounding Data: LLM Models for Reliable AI Applications

6 min readFeb 10, 2024

Numerous individuals are currently contemplating the implementation of Gen AI and LLMs in production services, encountering various challenges along the way.

“How can LLMs or AI chatbots be seamlessly integrated with existing IT systems, databases, and business data?”
“With a vast array of products, how can I ensure LLMs accurately memorize all of them?”
“Addressing hallucination issues in AI chatbots: Strategies for building a dependable and trustworthy service.

One of the quick solution is grounding with embeddings and Vector Search

What is grounding?
Grounding is the process of using large language models (LLMs) with information that is use-case specific, relevant, and not available as part of the LLM’s trained knowledge. It is crucial for ensuring the quality, accuracy, and relevance of the generated output.

In the contemporary landscape, data manifests in various forms such as text, images, videos, and both structured and unstructured databases. To enable artificial intelligence to comprehend this diverse information, we leverage AI-powered services that meticulously arrange the data into a simplified structure commonly known as ‘embeddings.

The values in embeddings are essentially arrays of numbers that represent the meaning or context of words. For instance, if we take the sentence ‘The cat sat on the mat,’ each word has a corresponding set of numbers in the embeddings:

‘cat’: [0.2, 0.8, -0.5]
‘sat’: [0.6, -0.1, 0.4]
‘mat’: [0.3, 0.7, -0.2]

The sentence embedding, which represents the overall meaning of the sentence, would then look like this:

[0.2, 0.8, -0.5, 0.6, -0.1, 0.4, 0.3, 0.7, -0.2]

To create these embeddings, you can use pre-built models provided by companies like Google or OpenAI. Additionally, there are open-source options such as Sentence Transformers. Here’s a sample code snippet illustrating how to create text embeddings using Google’s text embedding models:

import vertexai
from vertexai.preview.language_models import TextEmbeddingModel

vertexai.init(project=PROJECT_ID, location=LOCATION)

model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")

embedding = model.get_embeddings(["The cat sat on the mat"])

## output: [0.1,1.8,−1.5,............., 2.1]

Each embedding model produces outputs in a consistent array format, but the specific dimensions or number of values in the array can vary. For instance, Google’s text embedding model might generate arrays with 768 dimensions, while another model, like Sentence Transformers, could yield arrays with 384 dimensions. These variations arise from factors such as the volume of training data the model has been exposed to and its intrinsic capacity to grasp contextual nuances within text.

Now, let’s delve into the understanding capability of Google’s text embedding model. Google’s model, with its 768-dimensional embeddings, demonstrates a robust ability to capture the intricate meanings and relationships within text. This prowess is cultivated through extensive training on diverse datasets, enabling the model to discern and encode nuanced contextual information. The higher dimensionality of the embeddings allows for a richer representation of semantic nuances, making it adept at comprehending and encoding the intricacies of language and context.

The mechanism through which AI achieves understanding is a fascinating process. Once trained on specific content, be it text, images, or any other form of data, AI constructs a conceptual landscape known as the ‘embedding space.’ This space serves as a map, encapsulating the essence and meaning of the content.

In practical terms, AI can identify the location of each piece of content within this embedding space. The brilliance lies in how AI strategically places contents with similar meanings in close proximity to each other on the map.

Consider an example where the AI is exposed to a text discussing movies, music, and actors, with a distribution of 10%, 2%, and 30%, respectively. In this scenario, the AI can generate an embedding with three values — say 0.1, 0.02, and 0.3 — within a 3-dimensional space. These values effectively act as coordinates on the map, positioning the content in such a way that related topics, like movies and actors, are clustered together. This spatial arrangement reflects the semantic relationships inherent in the content, showcasing the AI’s ability to intuitively organize and comprehend diverse information within the embedding space.

Indeed, once embeddings are created, storing and efficiently retrieving them becomes crucial for search operations.

There are several options available, and you’ve mentioned a few, including

Here’s a sample code snippet illustrating how to insert created embedding in PgVector

#pip install pgvector

#create pgvector extension in postgres database
from pgvector.django import VectorExtension

class Migration(migrations.Migration):
    operations = [
        VectorExtension()
    ]
    
from pgvector.django import VectorField

class Item(models.Model):
    embedding = VectorField(dimensions=3)
    
item = Item(embedding=[0.1, -0.2, 3.1])
item.save()

We have stored our created embeddings in database of our choice. How do we do the semantic search or similarity search for our input text? For that we have to understand few more concepts.

Let’s see how these embeddings are organized in the embedding space with their meanings by quickly calculating the similarities between them and sorting them.

As embeddings are vectors, you can calculate similarity between two embeddings by using one of the popular metrics like the followings:

Metrics to calculate similarity between two vectors

Which metric should we use? Usually it depends on how each model is trained. In case of the model by google, we need to use inner product (dot product).

sample code to find nearest vector in PgVector

from pgvector.django import L2Distance

Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))
#Also supports MaxInnerProduct and CosineDistance

The process of vector storage and similarity search can be implemented across various tools, including Google Vector Search and Pinecone, by following their respective official documentation. Both tools provide APIs and services designed for efficient vector similarity search.

Once you’ve retrieved similar vectors based on your semantic or similarity search, you can leverage them for various use cases with Language Models (LLM).

With the google services like text embedding, vector search we can apply the innovation of embeddings, combined with the LLM capability, to various text processing tasks, such as:

LLM-enabled Semantic Search: text embeddings can be used to represent both the meaning and intent of a user’s query and documents in the embedding space. Documents that have similar meaning to the user’s query intent will be found fast with vector search technology. The model is capable of generating text embeddings that capture the subtle nuances of each sentence and paragraphs in the document.

LLM-enabled Text Classification: LLM text embeddings can be used for text classification with a deep understanding of different contexts without any training or fine-tuning (so-called zero-shot learning). This wasn’t possible with the past language models without task-specific training.

LLM-enabled Recommendation: The text embedding can be used for recommendation systems as a strong feature for training recommendation models such as Two-Tower model. The model learns the relationship between the query and candidate embeddings, resulting in next-gen user experience with semantic product recommendation.

LLM-enabled Clustering, Anomaly Detection, Sentiment Analysis, and more, can be also handled with the LLM-level deep semantics understanding.

Reference

Strategically Grounding Data: LLM Models for Reliable AI Applications

Written by muhil varnan