Stories by Bharathvajan G on Medium

Amazon Bedrock : Choosing the Right Architecture: RAG vs Agent vs Agentic RAG

Bharathvajan G — Sun, 08 Jun 2025 17:52:46 GMT

Amazon Bedrock : Choosing the Right Architecture: RAG vs Agent vs Agentic RAG

Introduction

In this article, we aim to explore the evolving landscape of intelligent system design through three key architecture patterns in the world of AI: Retrieval-Augmented Generation (RAG), Agents, and the emerging paradigm of Agentic RAG.

To ground our discussion in a practical context, we’ll illustrate how each of these patterns can be applied to a healthcare use case — specifically, enabling intelligent systems to assist patients in booking appointments and accessing clinical services.

Through this lens, we will:

Examine how RAG enhances AI responses by integrating external knowledge sources dynamically.
Understand how Agents allow large language models (LLMs) to take actions, plan tasks, and interact with external tools or APIs.
Explore Agentic RAG, which combines the strengths of both approaches to enable intelligent decision-making that is not only informed but also actionable.

Use-Case:

Learn about clinic services, doctor’s availaibility, speciality and book an in-person appointment.

Architecture Style #1: Plain RAG (Bedrock Knowledge Base)

Use-Case: Here’s a sample use case focused on medical or clinical services.

What are the general consultation charges?
Is there a pharmacy inside the clinic?
How can I cancel or reschedule my appointment?
What payment methods are accepted?
Is there a consultation fee for follow-up visits?
Do you accept ABC Health Insurance?

FAQ-style user questions that a clinic assistant could handle effectively using a Bedrock Knowledge Base. These types of questions are best answered by retrieving information from uploaded documents such as service brochures, doctor profiles, and clinic policy manuals.

Flow:

Design Architecture:

Architecture Style #2: Agent (Bedrock Agent)

Use-Case: Here’s a sample use case focused on book appointment

“Book an appointment with a cardiologist for Friday at 10 AM.”
“Reschedule my appointment to next Tuesday.”

A plain Bedrock Agent is well-suited for the appointment booking use case because it can understand user intent, manage multi-turn memory, and invoke APIs to perform real-time actions like checking availability and confirming bookings. It doesn’t require a knowledge base since all necessary data can be fetched dynamically via Lambda functions.

Flow:

Architecture Style #3: Agent (Bedrock Agent + KnowledgeBase)

Use-Case:

“User Query : I need a blood test package and also want to meet someone for diabetes. Can you help me with both?”

Design Architecture:

Conclusion

To wrap up, we’ve explored how each of these architectural patterns — RAG, Agents, and Agentic RAG — offers unique strengths in building intelligent, goal-oriented AI systems.

To summarize the patterns what we have seen

For simple document Q&A: Choose Bedrock Knowledge Bases (Plain RAG) offer the quickest path to value
For API-heavy workflows: Plain Agents provide the needed flexibility
For complex analytical tasks: Agentic RAG delivers superior results despite higher complexity

Building a Real-Time AI Assistant with Amazon Bedrock Agents — Part 01

Bharathvajan G — Tue, 03 Jun 2025 07:33:31 GMT

Building a Real-Time AI Assistant with Amazon Bedrock Agents — Part 01

Intro to Bedrock Agent

Bedrock Agent Capability

Health-Care Use Case : Book Doctor’s Appointment via AI Assistant (Chat)

Here is the example conversation between user and agent. And diagram shows respective action call happening by the Agents.

01. Bedrock Agent : Planning : in relevant to Use-case

Why Planning capability is important for our book appoint use-case ?

Agent’s planning capability — handles flexible or incomplete Input. For example

=> Users might say:
- “I want to see a doctor next Friday.” (or)
- “Book Dr. Steve again.” (or)
- “Is anyone free in the afternoon?”

=> The agent must:
01. Understand intent (booking)
- Extract known info (e.g., date = next Friday)
- Identify missing info (doctor, time)
- Ask the right questions to complete the task

02. Choose right action in order.
- Get available doctors → getAllAvailableDoctors
- Check time slots → getDoctorAvailability
- Book the appointment → bookAppointment

03. Decides
- Picks which action to call.
- Decides when to call it (e.g., only after doctor and date are selected).
- Uses results to guide the next step.

02. Bedrock Agent : Memory : in relevant to Use-case

Why memory is important in above booking appointment use case ?

Memory enable agent to

Remember user-provided inputs (e.g., date, doctor type).
Avoid repeating questions (“You already said June 5th”).
Hold temporary selections (e.g., doctor selected, available time slot).
Personalize interactions (e.g., remember last doctor booked — if persisted via custom memory).

The agent remembers:

Date: June 5th
Specialty: General physician
Selected Doctor: Dr. Steve

These values are held in session memory, so the agent doesn’t have to ask again

03. Bedrock Agents : Action & Action Group : in relevant to Use-case

Why Action & Action groups are important for book appointment use-case ?

An Action Group in Amazon Bedrock Agents is a collection of tools (actions) the agent can call to achieve specific goals — like booking an appointment, checking availability, or sending an SMS for confirmation.
Each action is like a function the agent can invoke with input/output specifications.

How it works :

Open API Spec Vs Json Schema :

04. Bedrock Agent : Guardrails : in relevant to Use-case

Prevent Harmful or Unsafe Content

Users may enter:

Inappropriate language
Misinformation (e.g., about medical conditions)
Sensitive personal health details

Guardrails help detect and block:

Toxic or offensive messages
Unsafe outputs from the model

2. Control Topics & Boundaries

Block off-topic inputs

Benefit of Agents in Book Appointment use-case :

Agent can reason over actions: “Should I call bookAppointment now?"
Automatic input formatting: Agent formats user input to match the action schema.
Output interpretation: Agent reads and uses the API response intelligently.
No need for hard-coded logic: Don’t have to manually wire flow steps.

Steps for creating Agent for “Booking Doctor Appointment” Use Case

Following diagram covers the high level implementation steps involved in creating agents.

Agent Architecture — Patterns / Options

When building a conversational experience for booking doctor appointments using Amazon Bedrock Agents, you can choose between a Single-Agent or Multi-Agent architecture depending on the complexity of the use case and long-term scalability needs.

Here is a section that outlines the advantages and disadvantages of Single-Agent vs Multi-Agent architecture.

Conclusion

In this article, we explored the context, fundamentals of Bedrock Agents, their capabilities, and a sample use case. In the next part, we’ll dive deeper into the implementation of the doctor appointment booking use case.

API Documentation Assistant: Building an Intelligent Q&A System with Amazon Bedrock Knowledge Base…

Bharathvajan G — Tue, 11 Mar 2025 06:04:17 GMT

API Documentation Assistant: Building an Intelligent Q&A System with Amazon Bedrock Knowledge Base — Part 01

01. Introduction

In this article, we’ll dive into RAG (Retrieval-Augmented Generation) and Amazon Knowledge Base, and demonstrate how they can be leveraged to create an API documentation assistant. This assistant will empower developers to quickly grasp enterprise APIs through an interactive Q&A chat interface, enabling them to efficiently consume, understand, and explore the availability of APIs.

02. Context

Problem Statement:

Developers often struggle to quickly understand and consume enterprise APIs due to the complexity and volume of documentation. Traditional methods of navigating API docs can be time-consuming and inefficient.

Solution:

We will design and implement an interactive Q&A chat interface powered by RAG and Amazon Knowledge Base. This assistant will retrieve relevant information from API documentation, generate accurate responses, and provide developers with a seamless way to explore and understand APIs.

03. Compelling reasons why an API Documentation Q&A Assistant is valuable:

Answer questions about API usage
Provide relevant code examples
Explain error scenarios
Check version compatibility
Extract relevant documentation sections

04. What is RAG ?

RAG (Retrieval-Augmented Generation) is an AI technique that: Finds the right information from a database or documents (Vector Store and Uses that information to give clear, accurate answers.

05. Amazon Bedrock Knowledge Base

Knowledge Base is a fully managed service that connects generative AI models (like Claude) to your data sources (e.g., S3, databases) for retrieval-augmented generation (RAG). It simplifies building intelligent applications by automating data ingestion, embedding generation, and semantic search, enabling accurate, context-aware responses

06. RAG Detail Design & Architecture — without Knowledge Base

Following diagrams shows, how to implement RAG system, with detailed steps without Knowledge Base.

** Steps 01 to 04 ** : These steps outline the process of reading API documents and storing the information in a vector data store, specifically using Amazon OpenSearch in our scenario.

Step 01: Ensure that all API and design documents are stored in a storage layer, such as Amazon S3 in our case.

Step 02: Perform a read operation to extract the text, and then divide the documents into smaller segments / chunks.

Step 03: Apply vector embedding to these chunks / splits.

Step 04: Save the resulting vector data in a vector data store

** Steps 05 to 07 **: These steps describe the process of conducting a similarity search for a user’s query.

Step 05: Convert the user’s query into a vector using vector embedding.

Step 06: Use this vector to perform a similarity search within OpenSearch.

Step 07: Obtain the results of the similarity search from OpenSearch.

Steps 08 to 09: These steps involve utilizing the user’s query and the results from the similarity search to generate a response.

Step 08: Use the user’s query along with the results obtained from the similarity search as context.

Step 09: Generate and provide a response based on this context.

RAG Detail Design & Architecture — without Knowledge Base

The following diagram illustrates how Steps 01 to 04 have been simplified using a Knowledge Base. There’s no longer a need for a data pipeline to read documents, split them, and store them in a Vector Store.

Step 01 — Step 04 : Simplified using Knowledge Base

07. Simplified RAG Design & Architecture — using Knowledge Base — Retrieval API

Knowledge Base Retrieval APIs —

Simplifies the retrieval process which involves indexing content, generating embeddings, performing similarity searches, and returning the most relevant results based on the query context
Enable efficient search and extraction of information from structured or unstructured data sources by using vector embeddings and semantic search capabilities to find relevant content.

Knowledge Base — Retrieval API

08. Simplified RAG Design & Architecture — using Knowledge Base — Retrieval & Generate API

Following diagram, even further simplifies on Retrieval & Generation Phase for end to end flow

(User Query → Retrieve Relevant Docs → Augment Prompt with Context → Generate Response)

Retrieve Phase:

Takes user query and searches knowledge base
Uses semantic search and vector embeddings
Returns most relevant documents/passages

Generate Phase:

Takes retrieved content as context
Uses LLM to generate natural language response
Combines knowledge base facts with model capabilities

Knowledge Base with Retrieve & Generate API

09. Conclusion

The Amazon Bedrock Knowledge Base simplifies the entire process of building intelligent, knowledge-driven applications by:

Automating data integration and processing.
Providing built-in semantic search and RAG capabilities.
Handling scalability and maintenance.

Without Bedrock Knowledge Base:

You’d need to manually set up data pipelines, generate embeddings, manage a vector database, and integrate retrieval with a generative model.

With Bedrock Knowledge Base:

Upload your API documentation to an S3 bucket.
Bedrock automatically processes the data, generates embeddings, and enables semantic search.
Use the knowledge base with a foundation model (like Claude) to build a Q&A chat interface for developers.

In the next article (Part 02), we will delve into the detailed steps involved in creating a Knowledge Base, complete with code snippets and examples.

Intro to Amazon Q Business and Create simple business Q&A app for Enterprise Data

Bharathvajan G — Sun, 22 Sep 2024 17:57:22 GMT

Amazon Q Overview

Amazon Q is a generative AI-powered assistant for business use

Designed to help employees search information, solve problems, and complete tasks within their Enterprise systems and data.
And It can create, test, and fix computer code. It can also plan and think through complex tasks step by step.

Amazon Q Products:

Amazon Q For Developer : Helps Developers & IT Professional (during SDLC Phase — development & testing cycle)
Amazon Q For Business : For Employess & Business Analyst

Amazon Q Types & Integration

Following diagram shows capability, features and integration with Amazon-Q

Amazon Q Business Overview

Amazon Q Business is generative artificial intelligence (generative AI) powered assistant that can do following — all based on the information in Enterprise

Answer questions
Generate content
Create summaries
Complete tasks

Amazon Q Business : Subscriptions Plan & Index Type

Amazon Q Business offers multiple index types and user subscription tiers. we can choose any combination of index types and user subscriptions for Amazon Q Business application environment.

There are two types of Amazon Q — Business subscriptions (plans). Following diagram depicts the same.

There are two types of Amazon Q — Business Index types. Following diagram depicts the same.

Amazon Q Business — DataSource Connector

It is a tool that combines data from different places into one main collection.

Amazon Q Business has several of these connectors that can link to your data sources, making it easy to create your AI solution with little setup.

Adobe Experience Manager
Alfresco
Aurora (MySQL + PostgreSQL)
Amazon FSx (Windows)
Amazon FSx (NetApp ONTAP)
Amazon RDS
Amazon S3
Amazon WorkDocs
Confluence
Dropbox

Create a Amazon Q Business application — for Q&A

Here is the high level architecture

In this section, we will see how we can create basic Q&A application with following steps.

Step-01: From AWS Management Console, navigate to Amazon Q Business.

Step-02: Lets start to “Create Application”

Step-03: Key in application name and select identity provider

Step-04: Select Retriever and Index type

Step-05: Connect DataSource

5.1: Choose “Upload file”” section.

5.2: Ensure the uploaded file is indexed (fyi: Here Git commands pdf file is uploaded)

Step-06: Configure the Users & Group. And Web Experience Settings.

Step-07: Navigate to web-experience link and login. And ask question related to content in uploaded document. (Here we are asking for git commands)

In case, if we are asking for non-relevant info (other than given datasource), Amazon Q replies with “could not find any info in the provided datasource)

Conclusion

We covered basic of high level overview of Amazon Q and Amazon Q Business with simple basic Q&A application with list of steps.

Amazon Event Bridge — Send & Receive Custom events

Bharathvajan G — Sun, 14 Jul 2024 18:50:17 GMT

Amazon Event Bridge — Send & Receive Custom application events

Event Bridge ?

Event Bridge extends CloudWatch event for the event driven architecture.
It is a serverless event bus that supports the publish/subscribe model.
EventBridge introduced to address the problems of SaaS platform integration with AWS services.

CloudWatch Events vs Event Bridge

Original goal of CloudWatch events for monitoring AWS Services. And it can support only AWS services as event sources.
CloudWatch events uses only “default” Event Bus
EventBridge provides gives option to create custom event buses and SaaS event bus in addition to default bus.
Over the time, CloudWatch events will be replaced by EventBridge

Event Bridge Components

When we talk about EventBridge, there are four components which we need to understand.

Event Source : Generate event
Event Bus: Carries event
Event Rule : Incoming events needs to match the given rule to route to target.
Event Target : Process event

The following diagram talks about Event Bridge components in detail.

Sample EventBridge Event

Here is the sample event payload in JSON

{
    "version": "1",
    "id": "0d279340-135b-c8c6-95c3-41fb8f496c52",
    "detail-type": "ABC Event Generated",
    "source": "my-customapp",
    "account": "0123456789",
    "time": "2024-04-01T18:41:53Z",
    "region": "us-east-1",
    "detail": { ... } // actual details goes here
}

Note:

AWS handles the following fields : version, account, time and region
We need to define the fields: source, detail & detail-type

Send & Receive Custom OrderCreated events through Custom Event Bus

In this article, lets see how to send & receive the Custom application event

Here is diagram represents Use-case scenario.

EventBridge Custom Event — UseCase Flow Diagram

Step-01: Create Custom Event Bus

Go to AWS Management Console > Select EventBridge Service
Select Event Bus > Create Custom Event Bus.

Step-02: (FulFillment Service) Target Lambda function to process OrderCreated event.

Create Lambda Function to process event as when Order is created in OrderService System.

import json

def lambda_handler(event, context):
    # TODO implement
    
    print(events)
    
    # Process Events
    # Custom Code goes here...
    
    return {
        'statusCode': 200,
        'body': json.dumps('Succesfully processed OrderCreated event from OrderService!')
    }

Step-03: Create Custom Event Rule

Go to the AWS Management Console > Select Event Bridge > Select Rule > Create Event Rule

Step: 03.1: Define the Rule details

Step: 03.2 : Choose Event Source Options

Step: 03.3: Define event pattern in JSON

{
 "source": ["OrderService"],
 "detail": {
    "event_type": ["order_created"]
 }
}

Step: 03.4 Select the Target

Step: 03.5 : Ensure Rule is created successfully within Custom Event Bus

Step-04: Publish event from OrderService Lambda function

Step 04.1 : Create Lambda function for OrderService

Step 04.2 : OrderService Lambda function to put event.

Sample Code snippet to show how to publish events in EventBridge.

import json
import boto3

def lambda_handler(event, context):
    
    client = boto3.client('events')
    
    detailJsonString = '{"event_type": "order_created", "OrderId" : "001-400-200"}'
    response = client.put_events(
        Entries=[
            {
                'Source':'OrderService',
                'Detail':  detailJsonString,
                'DetailType' : 'Order Management Event',
                'EventBusName':'custom-event-bus'
            }
        ]
    )
    
    print(response)

Step-05: Test Event from Receiver end (Fulfilment Service)

Go to CloudWatch Logs of FulFilment Service to see the event information.

{
 'version': '0', 
 'id': 'eec85261-2148-1514-4bb9-73bed5dc1f00', 
 'detail-type': 'Order Management Event', 
 'source': 'OrderService', 
 'account': '682960446509', 
 'time': '2024-07-14T18:34:06Z', 
 'region': 'us-east-1', 'resources': [], 
 'detail': {'event_type': 'order_created', 'OrderId': '001-400-200'}
}

Conclusion

In this article, we have seen how we can publish and receive Custom event through Custom Evenbridge Event Bus.

AWSAppSync Intro— Build the simple GraphQL API with DynamoDB as DataSource

Bharathvajan G — Thu, 04 Jul 2024 08:50:39 GMT

AWS AppSync Intro— Build the simple GraphQL API with DynamoDB as DataSource

Introduction

This article covers the basic introduction & concept of GraphQL & AWS AppSync service. And how to build simple GraphQL api using AWS AppSync with DynamoDB as datasource.

What is GraphQL API ?

An open-source data query and manipulation language for APIs.
It is a modern query language and a runtime for APIs, and seems to be Next-gen to REST APIs.
It introduces the concept of “fetch exactly what you required”, without any under fetching or over fetching of data.

GraphQL Operations

REST API vs GraphQL : API Request & Response

The following diagram shows the difference in making simple api request / response call with REST API vs GraphQL

AppSync Intro

It allows to develop Serverless GraphQL API

AWS AppSync works primarily on the GraphQL protocol, enables applications to request what they need, and nothing more, from the following supporting data sources

Amazon DynamoDB
Relational Database (Aurora)
Lambda
Http Endpoint

AppSync Capability & Integration

Apart from the API endpoint, AppSync has rich capability in terms of monitoring, securing & caching the API enpoints. Following digrams covers AppSync capability in detail.

AppSync Key Components

The following diagram covers the basic building blocks of AppSync.

Step by Step in Creating Simple GraphQL Custom API

Step-01: Create DynamoDB “customer” table

Let's start with DataSource.

Go to AWS Management Console > Select DynamoDB Service
Create Table “customer”, with “id” as partition key

Lets create an item with attributes (first_name, last_name etc) as shown in below screenshot.

Step-02: Create “Customer”GraphQL API

2.1 Create DynamoDB table

Go to AWS Management Console > Select AppSync Service
Choose API > Click on Create API Button
Select API Type as GraphQL API
For the Data source, choose “Start with DynamoDB table”

2.2 Specify the API Details

Configure the api details with “API Name”, choose DynamoDB region & table from the dropdown as shown in the below screenshot.

2.3 Configure GraphQL resource

On the GraphQL resources, lets add the fields with field name and required flag. This will create Schema for the GraphQL
And Select Runtime as Javascript.

2.4 Complete the configuration to create API.

Step-03: Verify the Schema & Resolver scripts.

3.1 Schema

Go to AppSync > Select APIs > Select “Customer API” > Choose “Schema” to view the auto generated schema.

Query block talks about query from Customer Table
Mutation block talks about add / update / delete Customer rows in the table

type Customer {
 id: String!
 first_name: String!
 last_name: String!
 gender: String
 address: String
 zipcode: String
 active: Boolean
}

type Query {
 getCustomer(id: String!): Customer
 listCustomers(filter: TableCustomerFilterInput, limit: Int, nextToken: String): CustomerConnection
}

type Mutation {
 createCustomer(input: CreateCustomerInput!): Customer
 updateCustomer(input: UpdateCustomerInput!): Customer
 deleteCustomer(input: DeleteCustomerInput!): Customer
}

3.2 : Resolver Code (in Javascript)

From the Schema, on the right side, we can find the resolver for the getCustomer. Click on resolver, which will take to Resolver script.

Simple Resolver code which has always two function (request & response)

For the Query resolver

request(ctx): Requedest from DynamoDB Item
response(ctx) : Returns the fetched DynamdoDB Item

import { util } from '@aws-appsync/utils';
import { get } from '@aws-appsync/utils/dynamodb';

/**
 * Sends a request to get an item with id `ctx.args.id` from the DynamoDB table.
 * @param {import('@aws-appsync/utils').Context<{id: unknown;}>} ctx the context
 * @returns {import('@aws-appsync/utils').DynamoDBGetItemRequest} the request
 */
export function request(ctx) {
    const { id } = ctx.args;
    const key = { id };
    return get({
        key,
    })
}

/**
 * Returns the fetched DynamoDB item.
 * @param {import('@aws-appsync/utils').Context} ctx the context
 * @returns {*} the DynamoDB item
 */
export function response(ctx) {
    const { error, result } = ctx;
    if (error) {
        return util.appendError(error.message, error.type, result);
    }
    return result;
}

Test the API Endpoint in Postman

Lets find the API endpoint from Settings screen.

Go the settings page > API details section > Find the GraphQL Endpoint.

And copy the API Key from “Primary Authorisation mode” section in same page.

To test api endpoint in PostMan, here are the configuration.

User Http Method as “POST”
Use the GraphQL Endpoint.

Request Body Payload:

query MyQuery {
  getCustomer(id: "123") {
    first_name
    last_name
  }
}

Http Header:

Content-Type : application/graphql
x-api-key : <>

API Response

{
    "data": {
        "getCustomer": {
            "first_name": "John",
            "last_name": "Doe"
        }
    }
}

Conclusion

In this article, we have seen, basic concepts of GraphQL, AppSync and how to build api quickly having dynamodb as Datasource and how to test the API endpoint in the Postman

Building Movie Recommendation System with AWS ML Services.

Bharathvajan G — Wed, 05 Jun 2024 19:10:52 GMT

Introduction

Movie or Product recommendation is one of the most popular applications in Machine Learning. And recommendation systems are based on modelling the preferences of the users on the items based on their past interactions. (e.g click events, user ratings)

There are two ways we can build recommendation system in AWS Cloud platform.

Amazon SageMaker — Factorization Machine Algorithm
Amazon Personalize

Amazon Personalize vs SageMaker: Factorization Machine

Here is the table talks about Amazon Personalize vs SageMaker FM algorithm

Amazon Personalize vs SageMaker Factorization Machine

In this article, lets see quick example for Movie recommendation using Factorization Machine.

Factorization Machine — Intro

Deals with Sparse data
It is extension of linear learning model to work on sparse data
Supervised Algorithm
Can be used for classification or regression problem
It expects all categorical values to be OneHotEncoded.
For model training, it expects recordio protobuf data format
It expects data in float-32 data type.
CSV file does not work

What is Record IO Protobuf ??

RecordIO is a file format used for efficient data storage and retrieval, particularly in the context of deep learning and data processing.
It allows to store large amounts of data in binary format, making it more compact and faster to read.
Protobuf (Protocol Buffers) is a language-neutral, platform-neutral, and extensible data interchange format
RecordIO + Protobuf, provides a way to store data efficiently in binary format

OneHotEncoding

One hot encoding provides a method of having a numeric representation of a feature that does not also have a size difference.
Binary values are assigned to each category. The zeros and ones form binary variables which show the presence or absence of a category.

Example : OneHotEncoding

Building FM Model : High level steps

Steps involved in dataset preparation, preprocessing, training and deployment of Factorization Machine model.

Training & Test DataSet

In this article, we will use MovieLens data sets, consists of: 100,000 ratings (1–5) from 943 users on 1682 movies.

Here is dataset link https://grouplens.org/datasets/movielens/100k/

Dataset consist of the following attributes.

userId = unique identifier of the user. [Input Feature]
movieId = unique identifier of the movie. [Input Feature]
rating = User rating of the movie (1–5). [Target]
timestamp = date variable.

Download dataset.

# !wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
# !unzip -o ml-100k.zip

2. Create training & testing dataframe

ratings_train_df = pd.read_csv('ua.base', sep='\t', names=['userId','movieId','rating','timestamp'] )
ratings_test_df = pd.read_csv('ua.test', sep='\t', names=['userId','movieId','rating','timestamp'] )

3. Convert the target variable (rating) into binary

ratings_train_df['rating_bin'] = (ratings_train_df.rating>=4).astype('float32')
ratings_test_df['rating_bin'] = (ratings_test_df.rating>=4).astype('float32')

4. Use OneHotEncode strategy

encoder = OneHotEncoder(handle_unknown='ignore')
encoder.fit(ratings_train_df[['userId','movieId']])

x_train = encoder.transform(ratings_train_df[['userId','movieId']]).astype('float32')
y_train = ratings_train_df['rating_bin']

x_test = encoder.transform(ratings_test_df[['userId','movieId']]).astype('float32')
y_test = ratings_test_df['rating_bin']

5. Prepare for Model training with instance count, instance type, max run time

from sagemaker.image_uris import retrieve

role = get_execution_role()
session = sagemaker.Session()

training_image = retrieve(region=boto3.Session().region_name, framework="factorization-machines", version='latest')

fm = sagemaker.estimator.Estimator(
    training_image, role,
    instance_count=1,
    instance_type='ml.c4.xlarge',
    volume_size=30,
    max_run=86400,
    output_path=output_path_prefix,
    sagemaker_session=session,
)

5. Set Hyperparameter & channel to initiate training job.

fm.set_hyperparameters(
    feature_dim=columns,
    predictor_type='binary_classifier',
    mini_batch_size=200
)

data_channels = {
    "train": train_data,
    "test": test_data
}

fm.fit(inputs=data_channels, logs=True)

6. Deploy the Model for the prediction (which creates real time endpoint)

%%time
fm_predictor = fm.deploy(
    initial_instance_count=1,
    instance_type="ml.c4.xlarge",
    deserializer= JSONDeserializer()
)

7. Cleanup (Delete the endpoint)

fm_predictor.delete_endpoint()

Conclusion

To summarize

Amazon Personalize is a fully-managed service focused on real-time recommendation systems, providing ease of use and scalability.
Amazon SageMaker is a more versatile and customizable service that enables developers to build, train, and deploy custom machine learning models for various tasks

We’ve seen quick example for item recommendation using Factorization Machine, we ‘ll cover using Amazon Personalize in next article.

Semantic Search by Amazon OpenSearch Serverless & Amazon Bedrock

Bharathvajan G — Tue, 09 Apr 2024 19:04:17 GMT

Introduction

In this article, we will cover the topic, “how we can leverage OpenSearch Serverless to perform Semantic search”, this is something different than typical full text search.

Amazon OpenSearch

It is derived from a mature version of Elasticsearch
It is a highly scalable, fully managed search engine and analytics service

Amazon OpenSearch Serverless

It is a serverless deployment option for OpenSearch that eliminates the need to provision and manage infrastructure.

Full Text Search

It is a search technique that looks for matches of a search query within the entire text of a document rows or a set of documents.
It searches for all occurrences of the words or phrases that are specified in the search query, regardless of their location or context within the document.

Semantic Search

It is a more advanced search technique understands the meaning of the search query and the context in which it is used.
It uses natural language processing (NLP) and machine learning to understand the intent behind the search query and provide more relevant results.

Vector Embeddings

Way to represent complex data, such as words, sentences, or even images as points in a vector space, using vectors of real numbers
Vector embeddings are a way to convert words and sentences and other data (audio, image etc) into numbers that capture their meaning and relationships

Semantic Search & Vector Embedding

Semantic search is one of the most popular uses of vector embeddings. Search algorithms like KNN and ANN require us to calculate distance between vectors to determine similarity.

Types of Vector Embedding

Word / Text Embedding
Document Embedding
Image Embedding
Audio Embedding

Create a Collection

Collection is very similar to Database in RDBMS world. Here are the steps to create a collection.

Go to AWS Management Console
From the Services -> select OpenSearch
Choose Serverless

Create Vector Index with JSON configuration

Index is similar to Table in the Database. Here are steps covers to create vector index in a collection.

Refer to the following JSON configuration, which has “mappings” & “settings” details to create an index

Note: We have only two properties (fields (or) columns) in this index(table)

doc_text : Data type is text
doc_vector: Data type is vector with dimension 4096

{
    "mappings": {
        "properties": {
            "doc_text": {"type": "text"},
            "doc_vector": {
                "type": "knn_vector",
                "dimension": 4096,
                "method": {
                    "engine": "nmslib",
                    "space_type": "cosinesimil",
                    "name": "hnsw",
                    "parameters": {"ef_construction": 512, "m": 16}
                }
            }
        }
    },
    "settings": {
        "index": {
            "number_of_shards": 1,
            "knn.algo_param": {"ef_search": 512},
            "knn": true
        }
    }
}

Ensure the index is created within the collection.

Create Lambda function to ingest words & perform semantic search

Go to AWS Management Console
Choose Lambda service
Create Function & enter the function name
Select Runtime as latest python 3.12 [Pls note: This runtime have latest boto3 having bedrock service.]
Ensure lambda’s IAM role has required policy permission for Bedrock and OpenSearch Service.

Code snippet : Create Vector Embedding for text/words leveraging Amazon Titan model.

import json
import boto3
import logging
from opensearchpy import OpenSearch, RequestsHttpConnection

logger = logging.getLogger()
logger.setLevel(logging.INFO)

bedrock = boto3.client(service_name='bedrock-runtime')

def lambda_handler(event, context):

    # Covert Word to Vector
    
    text_obj1 = "Tital Model : Text & Image generation, Summarization,"
    text_obj2 = "Stable Diffusion: Generate high quality image"
    text_obj3 = "Claude: Content Creation and Complex Reasoning"

    vector_obj1 = word_embedding(text_object1)
    vector_obj2 = word_embedding(text_object2)    
    vector_obj3 = word_embedding(text_object2)

    # Ingest vector embedding to OpenSearch
    ingest_document(text_obj1, vector_obj1)
    ingest_document(text_obj2, vector_obj2)
    ingest_document(text_obj3, vector_obj3)

def word_embedding(text):
    body=json.dumps({"inputText": text})
    response = bedrock.invoke_model(body=body, modelId='amazon.titan-embed-text-v1', accept='application/json', contentType='application/json')
    response_body = json.loads(response.get('body').read())
    embedding = response_body.get('embedding')
    return embedding

Code Snippet: Ingest Vector Embedding to OpenSearch

opensearch_client = OpenSearch(
    hosts = [{"host": "opensearch_endpoint_placeholder", "port": 443}],
    http_auth = auth, use_ssl = True, verify_certs = True,
    connection_class = RequestsHttpConnection,
    pool_maxsize = 10
)

def ingest_document(text_obj, vector_obj):
    document = {
      "doc_text": text_obj,
      "doc_vector": vector_obj
    }
    
    response = client.index(
        index = 'doc_index1',
        body = document
    )

Code snippet : Perform Semantic Search using OpenSearch Query DSL

Lets say we want to perform semantic search for “Image Model” from the index, first we need to convent search query text to vector embedding and then call the following method.

def perform_vector_search(vector):
    document = {
        "size": 15,
        "_source": {"excludes": ["doc_vector"]},
        "query": {
            "knn": {
                 "doc_vector": {
                     "vector": vector,
                     "k":10
                 }
            }
        }
    }
    response = client.search(
    body = document,
    index = "doc_index1"
    )
    return response

Conclusion

Finally to conclude, we have seen, how the OpenSearch provides efficient vector similiarity search by providing specialized k-NN index.

In next article, we will cover the use case ofOpenSearch Service’s vector database with Retrieval Augmented Generation (RAG) with LLMs, recommendation engines, and search rich media.

Amazon Bedrock : Generate Image using Stability AI Model from Lambda function.

Bharathvajan G — Sun, 14 Jan 2024 11:25:11 GMT

Amazon Bedrock : Generate Image using Stability AI Model from Lambda function.

What is Amazon Bedrock ?

It is a solution for Generative AI on AWS
Build and scale generative AI applications, that can generate text, images, audio, and synthetic data in response to prompts.
It gives the platform allows to choose AI models they would like to build their apps on top of, and then customise them with their own private data
It Provides many of the leading AI models (such as Anthropic’s Claude and Meta’s Llama 2) under one roof, allowing businesses to easily pick and choose the models they like, and experiment for different use cases

Available Generative AI Models in Amazon Bedrock

Following diagram covers all the supported Generative AI models & respective use-case in the Amazon Bedrock.

Request Model Access (Important Step before accessing Model APIs.)

By defaults models are not enabled in AWS Account. To access the Models, we need to request for granting access to Models (Note: it is available only in certain regions only).

Go to AWS Management Console
Choose Amazon Bedrock service
Select “Model Access” to manage
Choose models to request for access (as shown below)

Request for Model Access

Ensure model access is granted as shown in following picture.

Requested Model Access — granted

Examples & API Request in Amazon Bedrock

We can also refer some Examples mentioned in Amazon Bedrock, which based on search, providers or user-case.

Here is the example to create an image with text prompt using stability model.

Create Lambda function with requirement IAM role & Permission

Go to AWS Management Console
Choose Lambda service
Create Function & enter the function name
Select Runtime as latest python 3.12 [Pls note: This runtime have latest boto3 having bedrock service.]
Ensure lambda’s IAM role has required policy permission for Bedrock, S3 services.

Coding — Steps involved in text to image in the Lamba function.

Step-1: Import libraries

import base64
import io
import json
import os
import sys
import boto3

# Bedrock Runtime client used to invoke and question the models
bedrock_runtime = boto3.client(service_name='bedrock-runtime')

Step-2: Configure the Model id

model_id = "stability.stable-diffusion-xl-v0"

Step-3: Create Prompt & form the payload with steps

def lambda_handler(event, context):
    
    prompt = "a beautiful lake with cat and fish"    
    
    payload = {
        "text_prompts": [{"text": prompt}],
        "cfg_scale": 12,
        "seed": 452345,
        "steps": 80,
    } 
    body = json.dumps(payload)

Step-4: Invoke the Model

    
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId=model_id,
        accept="application/json",
        contentType="application/json",
    )

Step-5: Get the response API and save image in S3 bucket

    # Get the image from the API response. It is base64 encoded.
    response_body = json.loads(response.get("body").read())
    artifact = response_body.get("artifacts")[0]
    image_encoded = artifact.get("base64").encode("utf-8")
    image_bytes = base64.b64decode(image_encoded)
    
    # Save image to S3 location.
    content="String content to write to a new S3 file"
    s3_client.Object('s3-bucket', 'image-name.png').put(Body=image_bytes)

Step-6: Test the function with sample event and check the image generated in S3 bucket.

Here is the sample output from Stable Diffusion Model.

Conclusion

Amazon Bedrock provides developers access to a diverse array of foundation models through a serverless API.

It simplifies the application development and deployment process by offering access to foundational models and removes the necessity for developers to build their infrastructure.

Amazon SageMaker AutoPilot 101 — Create Experiment programatically

Bharathvajan G — Sun, 19 Nov 2023 15:46:20 GMT

Amazon SageMaker AutoPilot 101 — Create Experiment programatically

Amazon SageMaker

AWS SageMaker is a fully-managed service for machine learning in the cloud.
It lets you build and train machine learning models, directly deploying them into a production-ready, hosted environment.
SageMaker providing various following features

Jupyter Notebooks — SageMaker provides an integrated Jupyter notebook authoring instance. It offers easy access to data sources for exploration and analysis. There is no need to manage servers.

Machine Learning Algorithms — SageMaker provides common machine learning algorithms optimized to run against large data in a distributed environment.

Amazon SageMaker AutoPilot

It automatically builds, trains, and tunes machine learning models based on your data, by giving full control and visibility.
It prepares data, tests different algorithms, and optimizes model parameters to find the best approach for your data.
It has full visibility and control, transparent approach to AutoML, where — developers can manually inspect all the steps taken by the auto-ml algorithm from feature engineering to model training and selection.

AutoPilot Experiment

AutoPilot Experiment is used to start the AutoPilot job in Amazon SageMaker.
Experiment can be created using AWS SDK (programatically) or by Studio.

AutoPilot : Process Activity

AutoML Process involves — following steps

Input data
Selecting target column
Choose right algorithm, Create Model automatically
Selecting best models
Model deployment

Following diagram describes about activities involved in AutoML process.

Activity Steps involved in Auto ML Process.

AutoPilot — Supporting ML Problem statement

Following diagram shows — supported ML problem statement by AutoPilot

Steps : To Create Experiment Programatically

Step-1 : Add Import Statement

import boto3
import sagemaker

Step-2 : Create Session:

session = sagemaker.Session()
bucket = session.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

Step-3 : Create SageMaker Client:

sagemaker = boto3.Session().client(service_name='sagemaker', 
                        region_name=region)

Step-4 : Configure Job: Training time, Number of candidate models

aupilot_job_config = {
  'CompletionCriteria': {
    'MaxRuntimePerTrainingJobInSeconds': 900,
    'MaxCandidates': 5,
    'MaxAutoMLJobRuntimeInSeconds': 5000
  },
}

Step-5 : Configure — S3 Input Data location of CSV & label column name

job_input_data_config = [{
  'DataSource': {
    'S3DataSource': {
      'S3DataType': 'S3Prefix',
      'S3Uri': 's3://path-to-train-dataset/'
    }
  },
  'TargetAttributeName': ''
}

Step-6 : Configure output location for the Autopilot-Generated Assets

job_output_data_config = {
  'S3OutputPath': f's3://{bucket}/models/autopilot'
}

Step-7: Launch AutoPilot Experiment

sagemaker.create_auto_ml_job(
  AutoMLJobName    = 'sample_auto_mljob_1',
  InputDataConfig  = job_input_data_config,
  OutputDataConfig = job_output_data_config,
  AutoMLJobConfig  = autopilot_job_config,
  RoleArn          = role
)

Step-8: To get to know the information about the job

job_details = sagemaker.describe_auto_ml_job(
  AutoMLJobName = 'sample_auto_mljob_1'
)
print(job_details)

AutoML vs AutoPilot

As we have seen, end-to-end machine learning pipeline has the different stages like 1.Data Acquisition > 2.Data Exploration > 3.Data Preparation > 4.Feature Engineering > 5.Algorithm Section > 6. Model Training > 7.Model Tuning > 8. Model Deployment

Auto ML automates & simplifies Machine Learning Pipeline. Where as SageMaker Autopilot performs the same with full visibility and transparency.

Conclusion

Amazon SageMaker Autopilot aims to empower developers to create sophisticated ML models without having to deal with all the phases of ML workflow.

Developers can bring a dataset, start a SageMaker Autopilot job and walk away with the best model.