Generative-AI Application Architecture — 2

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

3 min readSep 14, 2023

This is the second part in the series Generative-AI-based Application Architecture, you can refer to the first part here.

Let’s now dive deep into the reference architecture. First, to understand the architecture's components, let’s talk about the “Generative-AI based Application Stack”.

Our reference application architecture is based on the concepts detailed in the article by Matt & Rajko. I refer to the article “Emerging Architectures for LLM Applications” by Matt Bornstein and Rajko Radovanovic.

The application stack consists of three main parts:

Single Page Application using the modern UI framework for both Mobile and Web interface.
Serverless Microservices-based API
Chat Engine using the Python Web Server Framework and Generative-AI Tools

In this series, we shall not discuss the first two parts as they are out-of-scope but in our implementation of the reference architecture for the Job Board use case we shall cover the nextjs and AWS lambda-based implementation.

Chat Engine Architecture

In the heart of our Generative-AI based application lies the Chat Engine framework. This framework seamlessly integrates the power of Generative-AI with the usability of modern web platforms. The Chat Engine consists of four main components Chat API Server, Intent Classifier and Prompt Selection Service and Prompt Execution Service.
Let’s delve deeper into the underlying components that constitute this engine.

Chat API Server: Acting as the gateway to the Chat Engine, the Chat API Server is pivotal. It is the intermediary, exposing API endpoints that the UI and other client-facing modules leverage to communicate with the Generative-AI mechanisms. This server not only ensures that requests are routed correctly but also manages load, ensuring scalability and high availability for user interactions. In the reference implementation, the Python Flask Web Server framework is used to implement the Chat API Server. The server is deployed on AWS using the AWS Beanstalk and Terraform.
Intent Classifier: A foundational piece in understanding user input is the Intent Classifier. Harnessing the power of RASA NLU, as detailed in this article, the classifier discerns the user’s intent with precision. It analyses user queries, deciphers their underlying objective, and paves the way for the subsequent services to respond aptly. The intent classifier model is trained using the application-sensitive context data. In our job-board implementation, we train our model based on a fixed number of user intentions like “Register as a user”, “Improve my Profile”, “Search for Job”, “Career Guidance” and “Apply for Job”. These intentions are determined based on the user profile, user interaction history and application context.
Prompt Selection Service: Post intent classification, it’s crucial to determine the right system prompt that aligns with the user’s intent. This is where the Prompt Selection Service comes into play. Utilizing the llamaindex framework, this service efficiently queries the best-fitting prompt. The marriage of user intent and the right prompt is critical, as it sets the stage for a meaningful AI-generated response. We use a large number of prompts designed specifically to implement relevant application use cases. These prompts are designed based on modern prompt engineering techniques.
Prompt Execution Service: The culmination of the chat process lies with the Prompt Execution Service. Leveraging advanced GPT engines (LLama-2 & ChatGPT API) and incorporating state-of-the-art prompt engineering techniques, as elucidated in this guide, this service crafts contextually rich responses. It’s here that the power of Generative-AI truly shines, producing responses that aren’t just accurate but imbued with a human-like nuance.

In sum, the Chat Engine Architecture encapsulates a streamlined process: from understanding user intent to producing finely crafted responses, ensuring users experience a dialogue that is both intuitive and insightful.

In the next article in the series, we shall discuss the implementation of the Intent Classifier.

Generative-AI Application Architecture — 2

Written by Ali Khan