The new LLM App stack

Furqan Shaikh
2 min readAug 15, 2023

--

It seems that the there is a new application platform emerging with the rise and popularity of LLMs (for eg GPT models). Similar to web, mobile, cloud wave, this new paradigm looks promising and worth looking into. For eg: in web world we have the MERN stack, in mobile world we have Android/iOS stack and so on. The new generation of applications emerging will be built on top of the LLMs. This also requires a stack to allow developers to build applications. Based on my limited understanding the new appstack looks something as below:

LLM App Stack

As this is a large landscape, for now, am starting backwards, with understanding the persistent stores (vector databases, to be more precise).

Use case — Quora Similar Questions Engine Search

What we will build here is a simple application which uses some of the layers of the above LLM App Stack. The user experience is as below:

  1. App builds an index of Quora questions and stores them in a vector database. We use Pinecone for this purpose.
  2. User enters a question in free form text and expects to receive a list of semantically-similar questions
Architecture for a Quora Similar Questions Search Engine

Let’s understand whats happening in the diagram above.

  1. Get hold of Quora questions dataset. We use HuggingFace datasets library for the purpose
  2. Pass each question through a Transformer model. The input is a free form question text and output is a high-dimensional vector embedding. We use HuggingFace transformers library for the purpose.
  3. Each vector embedding is persisted in a vector database index. We use Pinecone for the purpose.
  4. User passes a free form question text for semantic search
  5. The question is passed to the same Transformer model to produce a vector embedding
  6. Vector embedding is sent to Pinecone for a vector similarity search
  7. Pinecone returns a list of semantically similar questions

--

--