Selenium + Dynamic RAG + Serverless SageMaker Inference = Augmenting all possible actions?

11 min readSep 28, 2023

Purpose

The purpose of this blog is to really start ideating and using different creative modes of thinking to get generative AI concepts, such as retrieval augmented generation, dynamic RAG, langchain, and everything we have discussed in the prior blogs that can be seen here: https://medium.com/@madhur.prashant7. Here, is where we start our journey of using different concepts in combination with pre existing concepts to start augmenting realities and what we thought of as manual tasks to now be automated in every possible way. In this blog, we will focus on the concept of Dynamic RAG/retrieval augmented generation as discussed here: https://medium.com/@madhur.prashant7/some-dynamic-rag-implementation-non-hallucinating-fine-tuned-models-ed13b46f6a6d, and then we will dive into how we can design an AWS architecture in a way that can really compliment the dynamic RAG functionality, and yes, then combine it with the well known Selenium functionality to be able to perform tasks that might as well reduce manual intervention to another dimension. In this blog, we will take a look at a quick use case, how it relates back to generative AI, specifically the functionality of RAG, or more to say dynamic RAG, and then use that use case to address some of its correlated pain points. Once that is done, we can do a deep dive into the functionalities of Dynamic RAG and selenium and what their synergy can bring us, followed by a quick code walkthrough and lastly some of the future steps and ideas that we can conclude with as to what we can develop further in this realm.

Before diving into some of the main functionalities of this use case / code walkthrough, some of the prerequisites that would be good to know are the following:

200–300 level depth of machine learning understanding
Familiarity with the entire generative AI lifecycle, use cases, refer to my previous blogs to learn more
Some product roadmap knowledge

Note: I work at AWS, but the thoughts and ideas on these blogs are my own

Use Case + Pain Points to Be Addressed

In this blog, we will tackle the use case of e-commerce. Fun fact, I will actually talk a little about the start up I worked on my senior year of college. So, we all know using e-commerce platforms, specifically focusing on beauty and fashion products (that is what my start up aimed to target) involves a lot of human decision making, selective procedures, and takes enough time. There are tons of beauty and fashion websites out there, including Sephora, Shein, Amazon itself, and many more. For the purpose of this use case, I am going to set my target audience as users who heavily invest in beauty and fashion products, spend a considerable amount of time “window shopping” online, and adding things they want to purchase on their notes to remember for future or when “the item is in stock”. One of the pain points my start up aimed to solve was to augment user journeys, and these actions can be anywhere from adding items to the cart on behalf of the user, scheduling a delivery in the future, or even adding and ordering the item if that item is out of stock in the present for the future. Tracking these user actions can be hectic, and so we need to focus on being able to augment these actions. For the purpose of this blog, we will limit ourselves to augmenting the actions where users would be able to add the item for delivery or pay for it when it is in stock in the future if it is out of stock in the present. Here, we will take a look at how dynamic RAG really comes into play and how we can use a functionality like selenium in hands with it to augment this specific user action, and furthermore, have users not worry about the item being or not being in stock later.

Some of the major pain points associated to use cases that involve a lot of dynamic changes in products available creates a lack of trust between the user and the platform and makes it more of a game of luck by having the user order the item right before it goes out of stock. In this process, there is a lot of screen toggle, decisions, and some actions that can be automated, specially when it is already time consuming to decide on a product to purchase before actually purchasing the product (well, at least for me). Let’s look at a quick walkthrough of dynamic RAG and selenium and how they can compliment each other and solve use cases where there is dynamic shifts of data and augmentation can make things easier to attain.

Synergy between Dynamic RAG + Selenium

Dynamic RAG works by vectorizing and caching the data on that website in real time. This way the information on the linguistics website will use Chain of Thought to make this a possibility. Only models that are highly trained and large can do this, for example, Anthropic Claude and GPT.

Some quick words on how GPT-4/Anthropic Claude assists in chain of thought to make dynamic RAG a possibility:
Chain of Thought: “chain-of-thought (CoT) prompting [1] is a recently-proposed technique that improves LLM performance on reasoning-based tasks via few-shot learning. Similar to standard prompting techniques, CoT prompting inserts several example solutions to reasoning problems into the LLM’s prompt”

Usually, in the case of Dynamic RAG, there would be no data pre processing. In this case the client would go to the agent, and prompt using some chain of thought prompts. In this case, there would be some reasoning given in the context that the LLM can use to do something for that specific prompt right. In this case, the chain of thought will use this to execute the program, use the content from the dynamically changing data, stores it in the memory and then in this case you have the memory vectorize this data, and then you get the updated and accurate response.

This changes based on the data. The agent could help in querying, passing it to another model or tool, and you can really build off different pipelines and architectures in the case of this. Point being, you need to implement a solution like this if your business use case is ever changing. In the case of a medical use case, where data remains static for a while, you would not need this.

Now, let’s start building a quick architecture as seen below:

Here, we have a couple of users that are accessing our product above that have a list of products that are ever changing and cannot really be static. In use cases like these, we can select an LLM that can take care of really focusing on acting as a point of source for a dynamic RAG architecture, and help build vector stores and embeddings of chunks of data that is stored in S3 that the LLM accessing as its knowledge base. Based on the kind of product data we have, we can create lifecycle policies and furthermore have data archived and set to expire and even build predictions off of the amount of data about a product that seems to be on the rise compared to other products and we can view this or query it in athena or quicksight. Now, that we have a basic architecture of a bunch of information that is stored in S3 that is dynamically changing through a lambda function that can be connected to a dynamodb table with all of the product data in it through dynamodb streams, we can have our LLM act on that ever changing source of memory to update and delete data so that whatever that is accessible to our users is up to date. Now, talking about selenium a little bit:

“Web scraping is the process of extracting data from websites. It is a powerful technique that revolutionizes data collection and analysis. With vast online data, web scraping has become an essential tool for businesses and individuals.

Selenium is an open-source web development tool used to automate web browsing functions. It was developed in 2004 and is mainly used to automatically test websites and apps across various browsers, but it has now become a popular tool for web scraping. Selenium can be used with multiple programming languages, including Python, Java, and C#. It provides robust APIs for web page interaction, including navigating, clicking, typing, and scrolling.

Selenium web scraping refers to using the Selenium browser automation tool with Python to extract data from websites. Selenium allows developers to programmatically control a web browser programmatically, meaning they can interact with websites as if they were human users.“

Here is why selenium and its combination with generative AI can be groundbreaking:

Captures and stores data well from dynamic websites that can be stored in a knowledge base like an s3 bucket on amazon.
There are higher chances of user interactivity.
It is easy to debug.

****Now the question, what is the strength of this synergy between selenium and dynamic RAG?
Selenium can be highly proficient at simulating user actions but often depends on static datasets for testing, which may not adequately replicate real-world intricacies, like evaluating payment methods such as credit cards or inputting user details like a street address.

“To address this limitation, Generative Artificial Intelligence, or Generative AI, can assist in generating randomized yet realistic data for integration into your automated Selenium tests.“

So it is essential to note that with concepts of dynamic RAG and selenium, we can extract such ever changing product information from websites, and be able to display it to users, and perform user actions based on that, such as liking a product or filling out a product form because once the large language model has context about the ever changing data from selenium, we can have the task completion processes work out such as filling out forms and performing actions in the future long run. Let’s take a look at designing an embeddings model with LLaMa-2 and how we can get started on it with selenium.

Quick Code Walkthrough

The first step is to go ahead and install selenium in your environment

pip install selenium-wire

Set up the proxy for your project

from selenium import webdriver
from Selenium.webdriver.chrome.options import Options
from seleniumwire import webdriver as wiredriver
PROXY_HOST = 'productDemohost'
PROXY_PORT = 'productDemoport'
chrome_options = Options()
chrome_options.add_argument('r=http://{}:{}'.format(PROXY_HOST, PROXY_PORT))
driver = wiredriver.Chrome(options=chrome_options)

3. We can now Use Selenium Wire to inspect and modify requests.

for request in driver.requests:
if request.response:
print(request.url, request.response.status_code, request.response.headers['Content-Type'])

In the code above, we loop over all requests made by the WebDriver during the web scraping session.

from selenium import webdriver# Initialize the webdriver
driver = webdriver.Chrome()
# Navigate to the webpage
driver.get("https://www.example.com")
# Find all the title elements on the page
title_elements = driver.find_elements_by_tag_name("title")
# Extract the text from each title element
titles = [title.text for title in title_elements]
# Print the list of titles
print(titles)
# Close the webdriver
driver.quit()

In this example, we first import the web driver module from Selenium, then initialize a new Chrome web driver instance. We navigate to the webpage we want to scrape, and then use the find_elements_by_tag_name method to find all the title elements on the page.

Now that we have some of the data on how we can extract this information using selenium, how can we perform some dynamic RAG actions on it:

from decouple import config
from startup import utils
from startuplibrary.structures import Agent
from startuplibrary.tools import webscraper, websearch
web_search_activity = WebSearch(
    linguistics_website = config ("API_KEY"),
    linguistics_website_search = config ("API_KEY_SEARCHID"),
)
web_search_activity = WebScraper()
agentDemo = Agent (
    tools = [
        web_search, web_scraper
        
        ]
 )
 
 utils.Chat(agent).start()

We can either focus on having to perform dynamic RAG from the code above, or we can go with the standard way of deploying two models performing RAG and langchain components on it and then using a dynamically changing S3 as a source of context for our RAG purposes, making our solution dynamic as well:

Deploying RAG the normal way using LLaMa-2–7b as the LLM and an embeddings model, with ever changing data installed into s3 through web information scraped from the server:

Retrieval Augmented Generation (RAG) with Lanchain

Langchain: Framework for orchestrating the RAG Workflow
FAISS: Using an in-memory vector database for storing document embeddings
PyPDF: Python library for processing and storing the PDF Documents

%pip install langchain==0.0.251 --quiet --root-user-action=ignore%pip install faiss-cpu==1.7.4 --quiet --root-user-action=ignore%pip install pypdf==3.15.1 --quiet --root-user-action=ignore

FETCHING AND PROCESSING THE ever changing data from a form

filenames = [
'ProductForm Info.data',
]
data_root = "./data/"

filenames = [
'ProductForm Info.data',
]
data_root = "./data/"
import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
documents = []
for filename in filenames:
loader = PyPDFLoader(data_root + filename)
loaded_documents = loader.load()  # Use a variable to store loaded documents
documents.extend(loaded_documents)  # Extend the list with loaded documents
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=100,
)
docs = text_splitter.split_documents(documents)
print(f'Number of Document Pages: {len(documents)}')
print(f'Number of Document Chunks: {len(docs)}')

Number of Document Pages: 28
Number of Document Chunks: 170

Here, we split the documents into chunks to perform RAG:

Deploying a Model for Embedding: All MiniLML6 v2 and the LLaMa-2–7b-chat for our LLM

!pip install -qU \
sagemaker \
    pinecone-client==2.2.1 \
    ipywidgets==7.0.0

import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.huggingface import HuggingFaceModel
role = sagemaker.get_execution_role()
my_model = JumpStartModel(model_id = "meta-textgeneration-llama-2-7b-f")

from sagemaker.jumpstart.model import JumpStartModel
embedding_model_id, embedding_model_version = "huggingface-textembedding-all-MiniLM-L6-v2", "*"
model = JumpStartModel(model_id=embedding_model_id, model_version=embedding_model_version)
embedding_predictor = model.deploy()

--------!

embedding_model_endpoint_name = embedding_predictor.endpoint_name
embedding_model_endpoint_name

Creating and Populating our Vector Database:
from typing import Dict, List
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler
import json

class CustomEmbeddingsContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"
    
    def transform_input(self, inputs: list[str], model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"text_inputs": inputs, **model_kwargs})
        return input_str.encode("utf-8")
    
    def transform_output(self, output: bytes) -> List[List[float]]:
        response_json = json.loads(output.read().decode("utf-8"))
        embeddings = response_json.get("embedding", [])  # Use get() with a default value
        return embeddings  # Make sure to return the embeddings
    embeddings_content_handler = CustomEmbeddingsContentHandler()embeddings = SagemakerEndpointEmbeddings(
    endpoint_name= embedding_model_endpoint_name,
    region_name=aws_region,
    content_handler=embeddings_content_handler,
)
Now, with our embeddings, we can process our document chunks into vectors and actually store them somewhere. Our project will use the:FAISS: In-Memory vector database
from langchain.schema import Document
from langchain.vectorstores import FAISS
Now, we will store our FAISS database
db = FAISS.from_documents(docs, embeddings)
NOW, RUNNING VECTOR QUERIES!!

Now, we can perform dynamic RAG by asking simple questions from our chunked and embedded data from S3 that has updated information on the website.

query = "What is the latest nike product in stock on XYZ.com?"

results_with_scores = db.similarity_search_with_score(query)
for doc, score in results_with_scores:
print(f"Content: {doc.page_content}\nScore {score}\n\n")

You can furthermore perform prompt engineering:

PROMPT ENGINEERING FOR CUSTOM DATA

from langchain.prompts import PromptTemplate
prompt_template = """
<s>[INST] <<SYS>>
Use the context provided below to answer the question at the end. If you don't know the answer, please state that you don't know and do not attempt to make up an answer.
<</SYS>>
Context:
----------------
{context}
----------------
Question: {question} [/INST]
"""
PROMPT = PromptTemplate(
template = prompt_template,
input_variables=["context", "question"]
)

SageMaker LLaMa-2–7b-f LLM for our CUSTOM DATASET

# from sagemaker.jumpstart.model import JumpStartModel
llm_model_id, llm_model_version = "meta-textgeneration-llama-2-7b-f", "*"
llm_model = JumpStartModel(model_id=llm_model_id, model_version=llm_model_version)
llm_predictor = llm_model.deploy(
initial_instance_count=1, instance_type="ml.g5.4xlarge")
---------------!

llm_model_endpoint_name = llm_predictor.endpoint_name
llm_model_endpoint_name

llm = SagemakerEndpoint(
    endpoint_name=llm_model_endpoint_name, 
    region_name=aws_region, 
    model_kwargs={"max_new_tokens": 1000, "top_p":0.9, "temperature": 1e-11}, 
    endpoint_kwargs={"CustomAttributes": "accept_eula=true"},
    content_handler=qa_content_handler
)

query *=* "What is the latest nike product in stock on XYZ.com?"

Lastly, we can use chaining to get better responses to add a final touch:

qa_chain = RetrievalQA.from_chain_type(
    llm, 
    chain_type = 'stuff',
    retriever=db.as_retriever(), 
    return_source_documents=True, 
    chain_type_kwargs={"prompt":PROMPT}
)

Conclusion

The purpose of this blog was to open doors to ideas that can combine concepts that can stem ideas and solutions to change the world. Here we can use ’N’ number of different solutions to keep a track of the ever changing data using selenium on something as simple as a google form, and then use dynamic rag to refer to that ever changing information and product answers to questions online and fill out those forms without any manual intervention from the users. In the next blog, I will take a specific use case and focus on an end to end code walkthrough.

LinkedIn: https://www.linkedin.com/in/madhur-prashant-781548179/
Github: https://github.com/madhurprash