Develop your custom AI on Azure using OpenAI and the Cheshire Cat

Published in

Microsoft Azure

7 min readMay 5, 2023

I am originally a network engineer. Over the last 10 years, I have mainly focused on cloud networking, gaining hands-on experience with Kubernetes and Istio. I should note that AI is not my area of expertise. It wasn’t until the release of ChatGPT at the end of 2022 that I began to delve into the topic. I would like to thank Sofia Ferreira and Piero Savastano for for helping me get started on this new adventure.

What is the Cheshire Cat ?

I didn’t want to write a lot of code, so it took me some time to find an interesting entry point to gain hands-on experience with OpenAI. Browsing social media platforms, I found the Cheshire Cat project by Piero Savastano.

Citing the official project documentation:

The Cheshire Cat is an open-source framework that allows you to develop intelligent agents on top of many Large Language Models (LLM). You can develop your custom AI architecture to assist you in a wide range of tasks.

Once I gained access to OpenAI through my Azure subscription, I was eager to start experimenting with the product. However, while the Cheshire Cat implementation supported OpenAI from openai.com, I couldn’t use it to connect to the Azure OpenAI endpoint in my subscription. My first task was to propose a pull request (PR) to implement this feature. It involved some testing and a patch to the LangChain Python library, but I was able to get everything up and running in just a few days.

Just as a System Engineer with a Linux VM and an Apache Web Server can install Wordpress to start a blog and even create a Wordpress plugin using the existing framework, a Cloud Engineer with an Azure Subscription can use the Cheshire Cat tool to develop an AI agent. Cheshire Cat provides the necessary framework and codebase to simplify the development process, much like Wordpress does for creating a blog or plugin.

Learning some AI basics

While writing the patch for the Cheshire Cat integration with Azure OpenAI, I learned several things, including:

The Azure OpenAI resource offers an API endpoint that allows users to connect to Model Deployments.
There are models that provide a Completion API that generates natural languages responses to a prompt. The Completion API takes a prompt as input and generates a response that continues the prompt in a natural way, making it ideal for tasks such as chatbots, virtual assistants, and automated customer support.
There are models that provide an Embeddings API that returns numerical representations of text. These embeddings are stored in a vector database for later use. Embeddings are dense vectors that represent the meaning of words or sentences in a high-dimensional space, they can be used for a variety of natural language processing (NLP) tasks, such as text classification, clustering, and similarity comparison.

Deploy the Cheshire Cat on AKS

The Cheshire Cat provides a development environment based on docker-compose consisting of three containers:

core: the Cheshire Cat core microservice that exposes a REST API implemented with FastAPI
admin: a Node.js web interface for testing and configuration
qdrant: a open-source Vector Database

While the core and admin containers are intended for development and are not immutable, they do have the necessary Linux environment to run the Python code mounted as a Docker volume.

To run on Azure I decided to write my own immutable containers and I published the project kube-cheshire-cat.

Deploy OpenAI in your subscription

The Microsoft documentation has detailed instructions to create a resource and deploy a model using Azure OpenAI.

To begin, execute the following commands, which are the bare minimum required.

Create a resource group:

az group create --name cheshire-cat --location eastus

Deploy an Azure OpenAI resource

az cognitiveservices account create \
   --name cheshire-cat \
   --resource-group cheshire-cat \
   --location eastus \
   --kind OpenAI \
   --sku s0

Create a deployment for the model used for completions. I am going to use the gpt-35-turbo model in this example. For simplicity I will use the string gpt-35-turbo also for the deployment name.

az cognitiveservices account deployment create \
   --name cheshire-cat \
   --resource-group cheshire-cat\
   --deployment-name gpt-35-turbo \
   --model-name gpt-35-turbo \
   --model-version "0301"  \
   --model-format OpenAI \
   --scale-settings-scale-type "Standard"

Create a deployment for the embeddings model, this will be the text-embedding-ada-002 model.

az cognitiveservices account deployment create \
   --name cheshire-cat \
   --resource-group cheshire-cat \
   --deployment-name text-embedding-ada-002 \
   --model-name text-embedding-ada-002 \
   --model-version "2"  \
   --model-format OpenAI \
   --scale-settings-scale-type "Standard"

Start the Cheshire Cat

You can start the Cheshire Cat on your computer with docker-compose or you can deploy on Azure Kubernetes Service. The support for Azure OpenAI is already merged in the main branch.

docker-compose: follow the official getting started docs
AKS: follow the instructions from the kube-cheshire-cat repo

Configure the Cheshire Cat

When the Cheshire Cat starts without configuration, it uses a default model that always prompts the user to configure the language model.

If you choose to configure the language model from the web interface, make sure to select the Azure Open AI option and avoid unintentionally choosing the OpenAI options that use the openai.com APIs.

I find it quicker to configure the Cheshire Cat using an API call. Here is a short script that fetches my API endpoint and secret key and pushes the configuration to the core container.

#!/bin/bash
ENDPOINT=$(az cognitiveservices account show -o json \
--name cheshire-cat \
--resource-group cheshire-cat \
| jq -r .properties.endpoint)


KEY=$(az cognitiveservices account keys list -o json \
--name cheshire-cat \
--resource-group cheshire-cat \
| jq -r .key1)

# When using docker-compose
CORE_URL="http://127.0.0.1"

# When using AKS
# CORE_URL="https://mydnsname.eastus.cloudapp.azure.com"

curl -X 'PUT' \
  ''"$CORE_URL"':1865/settings/llm/LLMAzureOpenAIConfig' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "api_type": "azure",
  "api_version": "2022-12-01",
  "deployment_name": "text-davinci-003",
  "model_name": "text-davinci-003",
  "openai_api_base": "'"$ENDPOINT"'",
  "openai_api_key": "'"$KEY"'"
}'

Edit 16.05.2023: I was wrongly using the model gpt-35-turbo with the Azure OpenAI Completion API, so I changed the script above to use the model text-davinci-003 . Some models are completion models and should be used with the “Completion API”, and other models are chat models, and should be used with a “Chat Completion API”. For more information check PR195 and the LangChain Issue 4246. The <|im_end|> characters printed in the chat screenshots are because of this misconfiguration.

Interacting with the Cheshire Cat and the Rabbit Hole

How the Cheshire Cat is different from the ChatGPT service available on openai.com? The key difference is the usage of the vector database. Citing the official docs:

The Cat embeds two types of vector memories, namely the episodic and declarative memories. The formers are the things the human said in the past; the latter the documents sent down the Rabbit hole.

Let’s make a practical example. I asked the Cheshire Cat what is Istio Ambient. The gpt-35-turbo model was not able to provide an answer. Then I uploaded the pdf book “Istio Ambient Explained” into the Rabbit Hole. I then tried to ask again the same question:

The Cheshire Cat is able to provide an answer after uploading additional data to the Rabbit Hole

This works because the Cheshire Cat tool doesn’t simply send the question “What is Istio Ambient?” to the completion API. Instead, it first searches the local vector database for additional context and enriches the prompt with relevant data about Istio Ambient. This allows the completion API to provide an answer that is informed by the context provided in the prompt.

Here the debug logs to better understand what is going on:

The debug logs of the core container indicate that the declarative memory added to the prompt includes information that was sent to the Rabbit Hole as a PDF file.

Readers of this article who are familiar with Istio may notice that the prompt engineering process described here requires further tuning, as the model provided an incorrect answer. The entire purpose of Istio Ambient is to offer Istio functionality without requiring sidecars. Thus, the model’s response stating that Istio Ambient includes a control plane and sidecar proxies is incorrect.

Thanks to my colleague Martin Gjoshevski, I was able to debug this problem. Martin published a repository called llm-pfds that includes an example demonstrating how to load a PDF file and query the model for answers. The following minimal code snippet shows how to achieve this:

The code snippet above functions as anticipated, and the model accurately provides answers concerning the content of the PDF file submitted for embeddings. So, why isn’t the Cheshire Cat working correctly?

I discovered that the Cheshire Cat code is more complex than I originally thought. After uploading the PDF file, the code does not directly store the content in the vector database. Instead, it generates several brief summaries to condense the information. Unfortunately, this approach results in information loss, and as a result, the model’s final answer is incorrect.

To test my theory, I re-ran the experiment using a patched version ( PR197 ) of the Cheshire Cat code that entirely disables the summarization feature. The attached screenshot displays the outcome of my test.

This time, the answer is correct, and the Cheshire Cat promptly informs me that the sidecar-less data plane is the key feature of Istio Ambient.

Conclusion

Using Cheshire Cat on Azure to develop custom AI is straightforward. Azure OpenAI provides a powerful large language model, and running it on Kubernetes allows for easy scaling of individual components such as the qdrant vector database. Built on top of the LangChain Python library, Cheshire Cat can be further expanded with tools and plugins, making it a versatile and powerful tool for AI development.