Maximize Earnings with Minimal Effort: The AI Embedding Advantage (RAG: Retrieval-Augmented Generation)
This article explores how AI embeddings can enhance and potentially automate your user support system. We’ll discuss integrating these technologies to streamline operations and boost efficiency in your business.
Understanding neural networks: easy level
A neural network is a type of artificial intelligence model designed to mimic the way human brains operate. It consists of layers of interconnected nodes or ‘neurons’ that process information. Can be trained and once trained used to predict inputs, for example: predict if a picture includes a dog or a cat
The first layer, known as the input layer, receives the initial data (text, images, audios …). This data is then processed through one or more hidden layers in the middle, where the actual computation and transformation of data occur through a series of weighted connections. The final layer, called the output or prediction layer, produces the results based on the processed information from the hidden layers.
Each layer’s output serves as the input for the next layer, creating a chain of data processing steps that lead to a final decision or prediction.
Relationship between neural networks and embeddings
Embeddings are the “weighted connections” that develop during the training of a neural network. These connections represent the strength of the signal between neurons across different layers.
As the network learns from data, it adjusts these weights to better capture and represent complex patterns and relationships. The embeddings themselves are the distilled knowledge of the network, encapsulated in these weights, which can then be used to understand or predict new, similar types of data effectively.
Understanding Embeddings: How AI Groups Concepts by Semantic Relationships
Embeddings are incredibly useful for grouping concepts based on semantic meaning.
For example, in a trained model, the embedding vector for ‘cat’ would be positioned closely to ‘dog’ because both are animals, mammals, and share similar traits and behaviors. Conversely, ‘cat’ would be positioned far from ‘key’ in the embedding space, because ‘key’ is an inanimate object, typically made of materials like wood or metal, and lacks biological properties.
By mapping words or entities into this multidimensional space, embeddings allow us to see and utilize these semantic relationships, enhancing the model’s ability to process and analyze data accurately.
How Embeddings Can Boost Your Revenue by Automating User Support Tasks
In this section, we explore a practical application of embeddings to enhance and automate user support, thereby potentially increasing revenue. Here’s how it works:
- We first convert all past responses in our user support system into embeddings. This transformation allows the system to understand the semantic meanings of each response to the questions our users have asked us in the past.
- When a new question or ticket arrives is converted into an embedding. This new embedding is then compared against all existing past response embeddings.
- The system identifies the closest matching embedding and returns the associated response. This method ensures that the answer provided is one that has been used successfully in the past, thus avoiding the generation of incorrect or ‘hallucinated’ responses. By automating response selection in this way, we not only enhance efficiency but also ensure consistent and accurate support to our users.
Beyond Basics: Leveraging LLM Knowledge to Measure Complex Semantic Distances
As we delve deeper into the practical implementation of embeddings, it’s crucial to understand that measuring semantic distances goes beyond simple word associations like ‘dog’ and ‘cat’.
In a veterinary clinic’s user support system, for example, embeddings can capture and quantify the relationships between diverse veterinary terms and concepts. This capability stems from the embeddings being derived from a Language Model (LLM), which incorporates a vast array of language understanding.
Therefore, when we measure distances between embeddings, we’re leveraging the entire linguistic and contextual knowledge embedded within the LLM. This allows the system to accurately interpret and relate complex queries to the most relevant past responses, effectively enhancing the decision-making process in providing support.
Building a Veterinary Clinic Support Chatbot with Embeddings
In this section, we outline the steps to create an effective user support chatbot for a veterinary clinic using embeddings. Here’s the workflow:
- First, we receive a question through the chatbot interface.
- Next, we search for the closest matching embedding within our dataset of past questions and answers. This involves comparing the embedding of the incoming question to those in our dataset to find the highest semantic similarity.
- Once we identify the most similar past question, we use a Language Model (LLM) to generate an appropriate response. We do this by feeding the LLM the original chatbot question along with the response linked to the closest embedding. This ensures that the generated answer is both relevant and informed by previous successful interactions. This method leverages the LLM’s ability to understand and produce language in context, enhancing the chatbot’s ability to deliver precise and helpful responses.”
Below you can see the full demo of how the chatbot works in the following video to which we ask several questions.
Implementing a Chatbot code with AI Embeddings
In this section, we’ll walk through the Python code required to set up a veterinary clinic support chatbot using embeddings and OpenAI’s API. The process involves several key steps:
1. Embedding Creation:
- We start by creating embeddings for each question in our support dataset. This is done using the
get_embedding
function, which calls OpenAI's API to transform textual data into a numerical form that captures semantic meanings.
def get_embedding(text):
response = client.embeddings.create(input=text, model='text-embedding-ada-002', encoding_format='float')
return response.to_dict()['data'][0]['embedding']
2. Embedding the Support Dataset:
- The
embed_text
function reads a CSV file containing past questions and responses, generates embeddings for each question, and saves the new dataset with embeddings.
text_embeddings = embed_text("questions_vet.csv")
3. Finding Similar Questions:
- When a new query comes in, the
search
function finds the most similar existing question by comparing embeddings using cosine similarity.
def search(query, data, n_results=1):
query_embedding = get_embedding(query)
data["Similarity"] = data['Embedding'].apply(lambda x: cosine_similarity([x], [query_embedding])[0][0])
return data.sort_values("Similarity", ascending=False).iloc[:n_results][["Question", "Response", "Similarity"]]
4. Generating Custom Responses:
- Using the closest match found, the
generate_custom_response
function prompts the LLM to generate a tailored response, ensuring relevance and precision.
def generate_custom_response(user_question, closest_embedding):
generated_response = client.chat.completions.create(
model="gpt-4o", max_tokens=55, messages=[{
"role": "system", "content": (
f"We have received this question:\n{user_question}\n"
f"And the most similar answer we have in our database is this:\n{closest_embedding}\n"
f"Please generate an answer for this user only with the information that you have:"
)
}]
)
return generated_response.choices[0].message.content
5. Gradio Interface for Real-time Queries:
- Finally, a Gradio interface allows users to interact with the chatbot, inputting their questions and receiving both the most similar past responses and the newly generated answers in real time.
with gr.Blocks() as demo:
query_input = gr.Textbox(label="Search")
search_button = gr.Button("Search")
search_button.click(fn=search_and_generate_response, inputs=[query_input, gr.DataFrame(text_embeddings)], outputs=[output, response_output])
demo.launch()
Here we have all the code together, so we can use the example in any environment, jupyter, google colab, or even in plain python.
And on the other hand this is the dataset we have used for this demo
Conclusion: Expanding Context and Leveraging Knowledge with RAG Technique
- Utilizing GPT’s Knowledge Base: In this example, the embeddings are derived from the vast knowledge base of GPT models. This allows our chatbot to access a richer, more nuanced understanding of language and context.
- Beyond Token-Level Input: By using embeddings, we can extend the context available to the language model beyond the immediate tokens of input. Measuring similarity with GPT’s own engine enables a deeper contextual analysis than typical token-based processing.
- Leveraging RAG (Retrieval-Augmented Generation): This technique, known as Retrieval-Augmented Generation, combines the retrieval of relevant information (in this case, embeddings of past questions and answers) with the generative capabilities of LLMs. RAG helps in creating responses that are not only contextually appropriate but also informed by a comprehensive understanding of past interactions.
Useful references
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- OpenAI embedding models: New embedding models and API updates
- What Is Retrieval-Augmented Generation, aka RAG?
- What is a Vector Database & How Does it Work?
Final Disclaimer
AI applications aren’t here to replace humans but to empower them. By enhancing their capabilities, AI helps professionals serve customers better and more efficiently, granting them ‘superpowers’ in their daily tasks.