Building a Conversational AI with Memory on AWS Series: AWS Overview

Yinzhou Wang
4 min readNov 26, 2023

--

#1 in the series: backend architecture and services overview

This image is created using Lucidchart

The image above illustrates a simplified backend architecture. If an application is deployed on a large scale, more complicated architecture is needed.

A flow of messages will be something like this: a user sends a message in the Frontend, the API resided in the API gateway receives that information and passes it along to a Lambda function, the Lambda function updates/retrieves information to/from DynamoDB, and also send a request to a Sagemaker endpoint. Inside that endpoint, a Large Language Model (LLM) is served by a deep learning container TGI. After the information is received and processed by the LLM, it returns a response that will eventually go back to the user. Below are introductions to different AWS services, and for each of the services, I will upload a detailed tutorial.

Frontend

This is the user interface. It is completely your choice of how to build it. It can be built as a mobile application (React Native, Flutter) or a Web application (Streamlit). I personally think Streamlit is clean and easy to use. Check out their LLM examples!

API Gateway

This is the service that manages your APIs. You can create, integrate with other services, maintain, monitor, and deploy your API there. It will really help if you have multiple APIs.

If you are not familiar with the concept of API, I recommend reading a great article by Jeffrey Chiu.

For my case, I need to create a WebSocket API in API Gateway, because it performs better in real-time applications (less latency). It is not necessary to understand the details of it (I don’t), but if you are interested, you can check out this article by Gulgina Arkin.

Lambda Function

This is a serverless computing service, meaning you don’t need to create a server to run functions (Python, JS, …). It can perform a wide range of tasks including serving a webpage, communicating with a database, or text-preprocessing for LLMs. Here is an excellent introduction by Sam Williams.

In my case, it is a part of the API and contains a function that handles incoming requests. I use it for two purposes: 1) since I am building a multi-turn conversational AI, it needs to be fed with conversation history (LLM itself doesn’t have any memory, we need to add memory to the prompt). So I integrated Lambda and DynamoDB to ensure that for each user’s message, the past conversation is pulled from the DynamoDB. 2) After the conversation history is retrieved, along with the most recent user message, it is preprocessed into a Sagemaker- endpoint-compatible format. Then lambda sent this processed message to the endpoint, received a response, and returned it to the user.

DynamoDB

This is a NoSQL database. As mentioned above, I use it to store conversation history. In my case, I don’t think using SQL or NoSQL makes a difference, but if you have more structured data, maybe SQL is a better choice. If you want to learn more about DynamoDB, Collin Smith wrote a great introduction.

SageMaker

This is a fully managed service that allows you to fine-tune and deploy LLMs on AWS instances, and there are a lot more features as described here:

For hosting your LLM on Sagemaker, you need to specify a deep learning container, which makes text generation more efficient. I mostly use models from huggingface, so Huggingface’s Text Generation Inference (TGI) is the choice. Check out the documentation from Huggingface.

Conclusion

Above are the backend architecture and brief introductions to each service. The next article in line is a tutorial on preparation for fine-tuning LLMs on SageMaker.

I am not an expert in software development, so any critiques and suggestions are welcome!

--

--

Yinzhou Wang

Large Language Model and mental health; Human-Language Model Interaction; Digital Mental Health Interventions