Building a Conversational AI with Memory on AWS Series: AWS Overview

4 min readNov 26, 2023

#1 in the series: backend architecture and services overview

The image above illustrates a simplified backend architecture. If an application is deployed on a large scale, more complicated architecture is needed.

A flow of messages will be something like this: a user sends a message in the Frontend, the API resided in the API gateway receives that information and passes it along to a Lambda function, the Lambda function updates/retrieves information to/from DynamoDB, and also send a request to a Sagemaker endpoint. Inside that endpoint, a Large Language Model (LLM) is served by a deep learning container TGI. After the information is received and processed by the LLM, it returns a response that will eventually go back to the user. Below are introductions to different AWS services, and for each of the services, I will upload a detailed tutorial.

Frontend

This is the user interface. It is completely your choice of how to build it. It can be built as a mobile application (React Native, Flutter) or a Web application (Streamlit). I personally think Streamlit is clean and easy to use. Check out their LLM examples!

Build powerful apps using generative AI & LLMs * Streamlit

Your go-to platform to create, deploy, and share LLM-powered apps quickly.

streamlit.io

API Gateway

This is the service that manages your APIs. You can create, integrate with other services, maintain, monitor, and deploy your API there. It will really help if you have multiple APIs.

If you are not familiar with the concept of API, I recommend reading a great article by Jeffrey Chiu.

A Beginner’s Guide to APIs

with Real World Examples

medium.com

For my case, I need to create a WebSocket API in API Gateway, because it performs better in real-time applications (less latency). It is not necessary to understand the details of it (I don’t), but if you are interested, you can check out this article by Gulgina Arkin.

An Introduction To WebSocket

— My first hackathon

gaierken.medium.com

Lambda Function

This is a serverless computing service, meaning you don’t need to create a server to run functions (Python, JS, …). It can perform a wide range of tasks including serving a webpage, communicating with a database, or text-preprocessing for LLMs. Here is an excellent introduction by Sam Williams.

Introduction to AWS Lambda

With the rise of serverless technologies, services such as AWS Lambda have become increasingly popular. This article…

medium.com

In my case, it is a part of the API and contains a function that handles incoming requests. I use it for two purposes: 1) since I am building a multi-turn conversational AI, it needs to be fed with conversation history (LLM itself doesn’t have any memory, we need to add memory to the prompt). So I integrated Lambda and DynamoDB to ensure that for each user’s message, the past conversation is pulled from the DynamoDB. 2) After the conversation history is retrieved, along with the most recent user message, it is preprocessed into a Sagemaker- endpoint-compatible format. Then lambda sent this processed message to the endpoint, received a response, and returned it to the user.

DynamoDB

This is a NoSQL database. As mentioned above, I use it to store conversation history. In my case, I don’t think using SQL or NoSQL makes a difference, but if you have more structured data, maybe SQL is a better choice. If you want to learn more about DynamoDB, Collin Smith wrote a great introduction.

DynamoDb Introduction

AWS’s DynamoDb is a NoSQL database. It is also part of Amazon’s serverless offering meaning that you don’t have to…

collin-smith.medium.com

SageMaker

This is a fully managed service that allows you to fine-tune and deploy LLMs on AWS instances, and there are a lot more features as described here:

Amazon SageMaker Features

A list of the features of Amazon SageMaker.

docs.aws.amazon.com

For hosting your LLM on Sagemaker, you need to specify a deep learning container, which makes text generation more efficient. I mostly use models from huggingface, so Huggingface’s Text Generation Inference (TGI) is the choice. Check out the documentation from Huggingface.

Text Generation Inference

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Conclusion

Above are the backend architecture and brief introductions to each service. The next article in line is a tutorial on preparation for fine-tuning LLMs on SageMaker.

I am not an expert in software development, so any critiques and suggestions are welcome!