Enhancing Generative AI Chatbot Observability: Debugging Conversations with Arize Phoenix

Published in

Sympl Cloud

7 min readSep 18, 2024

In the rapidly evolving landscape of Generative AI, chatbots and language models have become integral to businesses across industries. However, as these AI systems grow more sophisticated and widely adopted, the challenge of monitoring and debugging their performance becomes increasingly complex.

It will be a daunting task to mitigate issues such as hallucination, prompt injection, and high latency without having an efficient way to pinpoint the exact place where the problem resides. And just like any other system, tracing can be a useful method to solve this problem. It creates a series of formatted logs with an identifier called a Trace ID, which allows you to group the logs that flow through a system.

In this post, I have prepared a step-by-step guide for setting up tracing for a chatbot using Arize Phoenix, an open-source LLM observability solution that you can self-host in your own environment and use it for auto-instrumentation of traces. You can find all of the code and resources used throughout this post in the associated Git repository. The concepts in this post are applicable to any situation where you want to setup LLM observability.

Solution overview

Figure 1: Solution Architecture Overview

The above architecture demonstrates how traces are collected via Arize Phoenix and persisted in Aurora PostgreSQL database. A demo application (“demo app”) is hosted in an Amazon Elastic Container Service (Amazon ECS) cluster to process user queries. The demo app leverages LangChain to orchestrate components such as Aurora PostgreSQL Vector Store and Amazon Bedrock, so LangChainInstrumentor was used to auto-instrument spans whenever a chain is invoked. LangChainInstrumentor can be setup in the backend as follows. As can be seen from below, it only takes 4 lines of code to setup.

from openinference.instrumentation.langchain import LangChainInstrumentor
from phoenix.otel import register
trace_provider = register(endpoint="http://localhost:4317")
LangChainInstrumentor().instrument(trace_provider=trace_provider)

Prerequisites

Before you get started, make sure you have the following prerequisites:

An AWS account
An AWS Identity and Access Management (IAM) federation role with access to do the following:
Create, edit, view, and delete VPC network resources
Create, edit, view and delete ECS resources
Create, edit, view and delete IAM roles and policies
For this post, we use the us-east-1 Region
Request foundation model access via the Amazon Bedrock console. For this post, I used anthropic.claude-3-5-sonnet-20240620-v1:0 foundation model

Build and Push Chatbot image to ECR

Complete the following steps to build a container image for the chatbot. We are using another open source project, Gradio for it.

Create ECR repository with name phoenix-demo-gradio

Figure 2: Create private repository in Amazon ECR

2. Build docker image

# Move to /gradio folder
cd gradio
# Build Gradio image
docker build -t <account_id>.dkr.ecr.us-east-1.amazonaws.com/phoenix-demo-gradio:latest .

3. Push image to ECR

# Authenticate your Docker client
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account_id>.dkr.ecr.<region>.amazonaws.com
# Push Gradio image
docker push <account_id>.dkr.ecr.us-east-1.amazonaws.com/phoenix-demo-gradio:latest

Figure 3: Private repository created in Amazon ECR

4. Copy the URI of the image

Figure 4: Copy Image URI of chatbot image

Set up the private access infrastructure

In this section, we set up the infrastructure such as VPC, private subnets, security groups, and Aurora PostgreSQL database using an AWS Cloud Development Kit (AWS CDK) script. The script is available from the associated repository.

The CDK script creates the following resources on your behalf:

A VPC with two private subnets in separate Availability Zones
Security groups and routing tables
IAM role and policies for use by Amazon Bedrock, and Amazon Elastic Container Service (Amazon ECS)
Aurora PostgreSQL Serverless v2 database
Secrets for database
SageMaker notebook

Before we execute CDK script, let’s go to line 138 of a file /infra/lib/phoenix-demo-stack.ts and replace the placeholder with the image uri copied from ECR.

# Before
taskDefinition.addContainer('Gradio', {
  containerName: 'gradio',
  image: ecs.ContainerImage.fromRegistry('PLACEHOLDER_PLEASE_REPLACE_WITH_YOUR_IMAGE'),
  ...
}

# After
taskDefinition.addContainer('Gradio', {
  containerName: 'gradio',
  image: ecs.ContainerImage.fromRegistry('<account_id>.dkr.ecr.us-east-1.amazonaws.com/phoenix-demo-gradio:latest'),
  ...
}

Given the change is saved and you have right profile setup for your Command Line or Terminal, execute the following command to provision the resources

# Move to /infra folder
cd /infra

# Run CDK script
cdk deploy

Once the deployment is successful, open DNS name of the load balancer in a browser.

Figure 5: Copy DNS name of the application load balancer

You should be able to see the demo chatbot.

Now, open DNS name of the load balancer with port 6006 in the browser. You should be able to see the phoenix dashboard.

Figure 7: Pheonix Dashboard with no traces

The chatbot is not fully functional yet, because we have not loaded the documents for the retrieval yet. As can be expected, when I submit a question, the model responds that it cannot provide the information because the context provided is empty and I see matching record in the phoenix dashboard.

Figure 9: Phoenix Dashboard (trace list)

Figure 10: Phoenix Dashboard — Retriever Span

Load PostgreSQL table with embeddings of dataset

In this section, we use Amazon SageMaker notebook to generate embeddings for our corpus dataset and load them to the PostgreSQL vector store. Embeddings are a fundamental concept in natural language processing and machine learning. They’re essentially a way to represent words or other discrete items as continuous vectors of real numbers. This vector representation allows us to capture semantic relationships between words in a way that computers can understand and process.

On the Amazon SageMaker console, under Applications and IDEs in the navigation pane, choose Notebooks.
Locate the notebook instance phoenix-demo-notebook created in earlier stage.

3. Once the notebook instance is ready, choose Open JupyterLab.

4. Upload build_pgvector_db.ipynb file from /notebook folder.

5. Open build_pgvector_db.ipynb file and execute cells.

Figure 13: Load AWS Whitepaper documents to PostgreSQL VectorStore

6. Verify successful retrieval of relevant chunks from PGVector vectorstore

Figure 15: Test successful retrieval of chunks from PostgreSQL VectorStore

LLM Observability in Action

Now let’s go back to the chatbot and invoke the same question.

Figure 15: Demo Chatbot (Successful Retrieval Augmentation)

In pheonix, you can see spans describing each step happening inside the chatbot.

Figure 16: Phoenix Dashboard — Trace Details

You can visit individual span for more information. For example, the span named as ChatBedrock shows you how the prompt is populated with the retrieved chunks from pgvector as well as the question invoked by the user.

Figure 17: Phoenix Dashboard — LLM Span (context augmented by retriever)

Note: Access to phoenix dashboard should be strictly controlled through robust authentication and authorization measures in a real world use case. There are many ways to do it and here’s one option leveraging ALB Listener Rule and Amazon Cognito - Authenticate users using an Application Load Balancer.

Clean up

Leaving resources that you no longer need in your AWS account may incur unwanted charges. After deploying the solution discussed in this post, consider running the provided cleanup command to delete the AWS infrastructure that was created. This helps prevent ongoing costs for cloud services that you are no longer using for this project.

cdk destroy --force

Conclusion

In this post, we demonstrated how to set up and operationalize LLM observability on a generative AI workload deployed on Amazon ECS. When using the architecture discussed in this post, it becomes much easier to pinpoint the problem area in your generative AI application and this will lead to higher productivity for your team.

I encourage you to experiment with various combinations and starting building a more secure, responsive generative AI solution with the improved LLM observability.