Building End-to-End Generative AI application — Day 2

3 min readJun 4, 2024

Welcome to day two of ‘Building End-to-End Generative AI application’.

We will have to circle back on my Day 1 learning, because we are going to build it from there. Fix the problems in those code.

Wait, there was a problem?!

Yeah, so remember we created a simple prompt and asked the LLM to answer it?

Well, if we ask the exact code what the previous question was, it won’t be able to answer.

Yeah, try it… No Chat History!

So the focus of this article is just to bring Conversational buffer Memory into the code.

What we are trying to do is save a copy of our past historic conversation in a buffer variable called Conversational Memory Buffer.

And the flowchart will look something like this:

So, all the history between you and your LLM is stored in a variable. And passed to the LLM in your next response. This will enable the LLM to get the context.

You must be thinking, isnt is like cheating? we are storing chats in a variable passing, wont it be like a fresh conversation to the LLM? technically…. yes, it is.

Which really puts us to the question: is it even an intelligence system? just like regular input/output chatbot. But in truth, think it as an external hippocampus of the LLM, to memory of the conversations and answer effectively.

Now, let see how it can done programmatically,

We have to import the library (of course). Then call ConversationBufferWindowMemory. The parameter k defines the last number of conversations to be stored between you and your chatbot.

Lets think about something if we keep k=2, ask my chat bot to ask what is your name which was 3 conversations before, it won’t remember?

yes it wont.

To solve those problem, you can increase the k values or go in for more effective memory classes like ConversationalSummaryMemory.

Now there are multiple memory classes in LangChain like buffer, window etc. for us to explore and can be used based on suited for our use case.

We have to pass the memory variable along the chain, so every time we invoke we have the conversation history variable is passed inside the chain.

Once we invoke the chain, remember to save the both the user input and llm response,

You can also see the context of this variable using load_memory_variable()

With the next invoke, lets as some question about our previous question to see if its working,

Voila, it remembers. now your conversation with LLm will be more relevant and precise.

You can find this entire code in my git repo →here.

We will continue to build our LLM more on my next article…. with RAG!

I would love to hear your comments and feedback. Thank you!

Building End-to-End Generative AI application — Day 2

Written by Uva rani Jagadeesan