Building a Stateless, OpenAI-Compliant Chatbot: A Guide to Seamless AI Integration

Published in

Nethive Engineering

7 min readFeb 1, 2024

Introduction

Nowadays, Artificial Intelligence (AI) development has become a popular digital trend for individuals and companies looking to create new projects. AI can help you build a new project from scratch or integrate it into an existing one by creating an integration with all the latest AI models.

Integrating AI into Language Learning Models (LLMs) is relatively easy, thanks to the vast amount of resources that have been published in the last year.

It is recommended to use Python and Typescript languages as they provide a long list of AI and LLMs compatible libraries and frameworks.

In this article, I will guide you through the process of how me managed to create an API-compliant chatbot powered by OpenAI.

The Needs:

You may be wondering why you should create a chatbot right now. There are several reasons, but I want to highlight the main ones below:

1. Responding to FAQs: Finding answers to your questions on a documentation page can be tedious. Even the most common FAQs at the end of the page may not be helpful. Moreover, to find the right question, you must navigate to the correct section. Wouldn’t it be easier to ask a friendly assistant to clear your doubts?

2. Getting rid of first-level support: In an enterprise context, customers often have basic problems with an application and need to contact support every time. It is a waste of time for both the customer and the support employer to respond to every support request. Moreover, the first-level support may not be aware of recently developed features, leading to longer resolution times. By using a chatbot, we can simulate first-level support that is always available and has the right knowledge to respond to every request.

3. Helping support employers: One of the real challenges is to reduce the amount of repetitive work done by users in the app. This can be done by instructing the chatbot to make some decisions in the users’ help process. The chatbot can not only help the end-user find the information they need but also assist them in the most common operations in the app. An AI-powered chatbot is a non-deterministic solution, so we need to handle false positives by asking the user to confirm the actions taken by the chatbot during the operations.

TL;DR
To sum up, you would need a chatbot for:

- Task automation
- Continuous support availability
- Cost and time reduction

Why you should stick to the OpenAI API standard right now?

At the moment, there are many ways to build an AI chatbot. You can build your chatbot by choosing one from many SaaS options in the market, such as Amazon Lex, IBM Watson Assistant, and Azure Bot Service.

So, why should you use the OpenAI API to build your chatbot?

Rapidly spreading OpenAI popularity: OpenAI gained popularity when they released ChatGPT in November 2020, and since then, the adoption of the model has rapidly increased. As a result, many compatible libraries have been implemented by the AI community.

OpenAI API features: Conversational APIs allow you to communicate conversationally with a trained LLM, simulating a chat between an assistant and a user.

When it comes to conversational APIs, OpenAI mostly dictates feature implementation. Therefore, sticking to this standard means that you have the possibility of being the first to use a new feature when it is introduced, giving your chatbot a significant advantage over chatbots that do not use OpenAI APIs.

Two game-changing features are Function Calling and Streamable Responses.

Function Calling: Function Calling allows you to provide one or multiple function signatures in an API request to both models `gpt-3.5-turbo-1106` and gpt-4–1205-preview (latest models at the moment of this article).

These function signatures are analyzed and translated into an understandable syntax for the LLM, enabling the model to understand the user’s question and respond with a function call that has the correct parameters.

This feature is particularly useful when you need to interact with external APIs, as it enables you to create a chatbot that can answer your questions by calling external APIs or converting natural language into API calls.

Check the example below:

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
functions = [ 
 {
 "name": "get_users",
 "description": "Get all users in a system or get user filtered by a given username",
 "parameters": {
 "type": "object",
 "properties": {
 "username": {
 "type": "string",
 "description": "The username provided, e.g. 'Leeroy Jenkins'",
 },
 },
 },
 }
]
messages = [{"role": "user", "content": "Does the user 'Jack' exist?"}]
completion = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=messages,
 functions=functions,
)
print(completion)

The response from OpenAI API will be the following.

{
 "id": "chatcmpl-8GVDlP8bnKlehiMLi3oRumLyfxJ8g",
 "object": "chat.completion",
 "created": 1698943953,
 "model": "gpt-3.5-turbo-0613",
 "choices": [
 {
 "index": 0,
 "message": {
 "role": "assistant",
 "content": null,
 "function_call": {
 "name": "get_users",
 "arguments": "{\n \"username\": \"Jack\"\n}"
 }
 },
 "finish_reason": "function_call"
 }
 ],
 "usage": {
 "prompt_tokens": 76,
 "completion_tokens": 15,
 "total_tokens": 91
 }
}

As expected, it responds to us with the correct function to invoke given the question. If we do not specify a question matching the function firm, the request will be a simple chat completion API request responding using whole trained LLM knowledge.

Streamable mode

If stream boolean parameter is true in the chat completion POST request response data will be outputted in chunks by using server-side events. This mode is really useful (especially for long responses), as we can provide the user with a better UX, giving the impression that the response to the question is given faster.

Check again the example below!

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
completion = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[
 {"role": "user", "content": "Respond with \"Supercalifragilisticexpialidocious\""}
 ],
 stream=True
)
for chunk in completion:
 print(chunk.choices[0].delta)

{
 "role": "assistant",
 "content": ""
}
{
 "content": "Sup"
}
{
 "content": "erc"
}
{
 "content": "al"
}
{
 "content": "if"
}
{
 "content": "rag"
}
{
 "content": "il"
}
{
 "content": "istic"
}
{
 "content": "exp"
}
{
 "content": "ial"
}
{
 "content": "id"
}
{
 "content": "ocious"
}
{}

Analyzing the architecture

To begin analyzing the architecture, we must first establish the properties we wish to achieve:

- OpenAI API key anonymization: our chatbot’s logic must hide the OpenAI API key from the client. Otherwise, any client could use our OpenAI API key to access all OpenAI premium features, and of course this is not what we want.

- Agnostic business logic: our chatbot should be independent of the backend context we choose to implement. This means that it should be easy for us to switch to another chatbot context if needed.

- Multi-tenancy logic: our chatbot must implement a multi-tenancy logic, allowing multiple clients to interact with the same instance of the chatbot and (potentially) access to different personal information.

Fig.1 Communication flow between our chatbot and OpenAI.

Our chatbot’s architecture is composed of three main entities. In order:

- Frontend
- Proxy layer
- Backend

Communication flow

In the Fig.1, communication flow is described as follows:

1) The user sends a request to the frontend side of the chatbot by simply sending a message.
2) Request lands upon the proxy layer, which will be responsible for forwarding the message to the backend business logic.
3) The backend translates the user request into an OpenAI API-compatible request and contacts OpenAI to receive a response.
4) Next, given a prompt and a previously generated context, we are ready to ask an OpenAI request.
5) After a response is generated from OpenAI, it will be sent to our proxy layer without directly contacting the backend.
6) The backend will receive the request from OpenAI and perform the necessary parsing.
7) Once the response has been correctly parsed (and eventually other actions are triggered), the response is ready to be sent back to the frontend.
8) Response is now ready to be returned to the user as a chatbot response.

Frontend

This is the entry point of our chatbot. Frontend defines the UI/UX of our chatbot to make the user communicate with the underlying system. Like every chatbot, communication is bi-directional and synchronous.

This means a user cannot make another request if one has already been done and it is being processed. More on how is frontend designed and architectured will come within the next articles.

Proxy layer

The proxy layer is essential to maintain our chatbot OpenAI API compliant. This is accomplished by using the same request and response format used by OpenAI. But how do we get these?
What we did was taking a look at OpenAI API Reference and rebuild the models and the APIs needed to communicate with OpenAI using the chat conversational API.

We defined all these APIs and the relative models in a swagger.yaml file, used to represent a convention to make client and server communicate. Using this swagger file, we can then generate the client and the server just by using a code-gen library.
Approached in this way, we just need to modify our APIs if OpenAI API changes its API version, and we will remain compliant.

The biggest advantage of this proxy architecture is that if you find a cool library that interfaces itself with OpenAI chat completion API, you will automatically be compatible with the chatbot too, as they would use the same exact API format.
As a practical scenario, imagine that you need to integrate your chatbot with a frontend library communicating with OpenAI. Obviously, the frontend library (that could be integrated by using any programming language!) will expect to communicate with OpenAI using its API reference.

Instead of making it communicate with OpenAI, we put our proxy layer in the middle of the communication stack, making it instead interacting with the frontend library.

Backend

The actual implementation of the context-based chat will be responsible to handle the communication in the last point of our architecture. It will be responsible for asking OpenAI the response to a given user question. It is important to mention that by using the code-gen we can rapidly implement our backend in any language we want. Again, more on the backend will come in the next articles.

Conclusions

As a result of this, we ended up building a chatbot that will make you will simplify the integration with any context-based chatbot interacting with any language-based library.Furthermore, you’re decoupling the logic outside your backend or frontend implementation.

This means you don’t have to re-implement every time your custom logic on your backend or frontend by specifying how should the chatbot behave. If this article caught you interest, stay updated to understand more about frontend and backend technical details!