Enhancing Generative AI Chatbot Observability: Debugging Conversations with Arize Phoenix
In the rapidly evolving landscape of Generative AI, chatbots and language models have become integral to businesses across industries. However, as these AI systems grow more sophisticated and widely adopted, the challenge of monitoring and debugging their performance becomes increasingly complex.
It will be a daunting task to mitigate issues such as hallucination, prompt injection, and high latency without having an efficient way to pinpoint the exact place where the problem resides. And just like any other system, tracing can be a useful method to solve this problem. It creates a series of formatted logs with an identifier called a Trace ID, which allows you to group the logs that flow through a system.
In this post, I have prepared a step-by-step guide for setting up tracing for a chatbot using Arize Phoenix, an open-source LLM observability solution that you can self-host in your own environment and use it for auto-instrumentation of traces. You can find all of the code and resources used throughout this post in the associated Git repository. The concepts in this post are applicable to any situation where you want to setup LLM observability.
Solution overview
The above architecture demonstrates how traces are collected via Arize Phoenix and persisted in Aurora PostgreSQL database. A demo application (“demo app”) is hosted in an Amazon Elastic Container Service (Amazon ECS) cluster to process user queries. The demo app leverages LangChain to orchestrate components such as Aurora PostgreSQL Vector Store and Amazon Bedrock, so LangChainInstrumentor was used to auto-instrument spans whenever a chain is invoked. LangChainInstrumentor can be setup in the backend as follows. As can be seen from below, it only takes 4 lines of code to setup.
from openinference.instrumentation.langchain import LangChainInstrumentor
from phoenix.otel import register
trace_provider = register(endpoint="http://localhost:4317")
LangChainInstrumentor().instrument(trace_provider=trace_provider)
Prerequisites
Before you get started, make sure you have the following prerequisites:
- An AWS account
- An AWS Identity and Access Management (IAM) federation role with access to do the following:
- Create, edit, view, and delete VPC network resources
- Create, edit, view and delete ECS resources
- Create, edit, view and delete IAM roles and policies
- For this post, we use the
us-east-1
Region - Request foundation model access via the Amazon Bedrock console. For this post, I used
anthropic.claude-3-5-sonnet-20240620-v1:0
foundation model
Build and Push Chatbot image to ECR
Complete the following steps to build a container image for the chatbot. We are using another open source project, Gradio for it.
- Create ECR repository with name
phoenix-demo-gradio
2. Build docker image
# Move to /gradio folder
cd gradio
# Build Gradio image
docker build -t <account_id>.dkr.ecr.us-east-1.amazonaws.com/phoenix-demo-gradio:latest .
3. Push image to ECR
# Authenticate your Docker client
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account_id>.dkr.ecr.<region>.amazonaws.com
# Push Gradio image
docker push <account_id>.dkr.ecr.us-east-1.amazonaws.com/phoenix-demo-gradio:latest
4. Copy the URI of the image
Set up the private access infrastructure
In this section, we set up the infrastructure such as VPC, private subnets, security groups, and Aurora PostgreSQL database using an AWS Cloud Development Kit (AWS CDK) script. The script is available from the associated repository.
The CDK script creates the following resources on your behalf:
- A VPC with two private subnets in separate Availability Zones
- Security groups and routing tables
- IAM role and policies for use by Amazon Bedrock, and Amazon Elastic Container Service (Amazon ECS)
- Aurora PostgreSQL Serverless v2 database
- Secrets for database
- SageMaker notebook
Before we execute CDK script, let’s go to line 138 of a file /infra/lib/phoenix-demo-stack.ts
and replace the placeholder with the image uri copied from ECR.
# Before
taskDefinition.addContainer('Gradio', {
containerName: 'gradio',
image: ecs.ContainerImage.fromRegistry('PLACEHOLDER_PLEASE_REPLACE_WITH_YOUR_IMAGE'),
...
}
# After
taskDefinition.addContainer('Gradio', {
containerName: 'gradio',
image: ecs.ContainerImage.fromRegistry('<account_id>.dkr.ecr.us-east-1.amazonaws.com/phoenix-demo-gradio:latest'),
...
}
Given the change is saved and you have right profile setup for your Command Line or Terminal, execute the following command to provision the resources
# Move to /infra folder
cd /infra
# Run CDK script
cdk deploy
Once the deployment is successful, open DNS name of the load balancer in a browser.
You should be able to see the demo chatbot.
Now, open DNS name of the load balancer with port 6006 in the browser. You should be able to see the phoenix dashboard.
The chatbot is not fully functional yet, because we have not loaded the documents for the retrieval yet. As can be expected, when I submit a question, the model responds that it cannot provide the information because the context provided is empty and I see matching record in the phoenix dashboard.
Load PostgreSQL table with embeddings of dataset
In this section, we use Amazon SageMaker notebook to generate embeddings for our corpus dataset and load them to the PostgreSQL vector store. Embeddings are a fundamental concept in natural language processing and machine learning. They’re essentially a way to represent words or other discrete items as continuous vectors of real numbers. This vector representation allows us to capture semantic relationships between words in a way that computers can understand and process.
- On the Amazon SageMaker console, under Applications and IDEs in the navigation pane, choose Notebooks.
- Locate the notebook instance
phoenix-demo-notebook
created in earlier stage.
3. Once the notebook instance is ready, choose Open JupyterLab.
4. Upload build_pgvector_db.ipynb
file from /notebook
folder.
5. Open build_pgvector_db.ipynb
file and execute cells.
6. Verify successful retrieval of relevant chunks from PGVector
vectorstore
LLM Observability in Action
Now let’s go back to the chatbot and invoke the same question.
In pheonix, you can see spans describing each step happening inside the chatbot.
You can visit individual span for more information. For example, the span named as ChatBedrock
shows you how the prompt is populated with the retrieved chunks from pgvector as well as the question invoked by the user.
Note: Access to phoenix dashboard should be strictly controlled through robust authentication and authorization measures in a real world use case. There are many ways to do it and here’s one option leveraging ALB Listener Rule and Amazon Cognito - Authenticate users using an Application Load Balancer.
Clean up
Leaving resources that you no longer need in your AWS account may incur unwanted charges. After deploying the solution discussed in this post, consider running the provided cleanup command to delete the AWS infrastructure that was created. This helps prevent ongoing costs for cloud services that you are no longer using for this project.
cdk destroy --force
Conclusion
In this post, we demonstrated how to set up and operationalize LLM observability on a generative AI workload deployed on Amazon ECS. When using the architecture discussed in this post, it becomes much easier to pinpoint the problem area in your generative AI application and this will lead to higher productivity for your team.
I encourage you to experiment with various combinations and starting building a more secure, responsive generative AI solution with the improved LLM observability.