Sirens of The Age: Private LLM(s) | LangChain 101

Chat with your documents! — LangChain, LLAMA3, OLLAMA, External Knowledge Injection

9 min readJun 17, 2024


Hi, it’s been a while since I’ve written — long time no write from me, @mebaysan. I just completed the first semester of my master’s degree. Nowadays, I am studying LLM concepts to develop my own solutions for my projects. I started to use LangChain to soil my hands. In this article, I am going to show you that how you can chat with your documents.

Image is generated by me | “Sirens (Σειρῆνες) in data ocean.”

While writing these lines, this technology made me recalled Sirens (Σειρῆνες). We are sailors in data ocean and AI is a Siren in the ocean. I am not the only one who call this technology as a weapon. If you wonder one of the reasons why it is, check the link

101 — Basic Concepts, Tools & Resources

Tools & Resources

I use LangChain as a framework to develop LLM applications. It is easy to use for building blocks by chaining components. OLLAMA to run LLAMA-3 on my local. You can use HuggingFace to use different models too, but I didn’t want to set access tokens and configure some external APIs etc. FAISS as our vector db for our 2nd implementation which is able to utilize external data from our documents.

You also can check the article below to find the most appropriate vector db for your needs.

Some of the resources that I can suggest with peace of mind are below;

Basic Concepts

We can think that LangChain is an API of collection of APIs. It has integrations of more than 50 vendors and platforms. It helps us to develop our own solutions by combining its components.

Figure from Building LLM Powered Applications

Some of the builtin components are;

  • Prompt templates
  • Document Loaders (Parsers)
  • Text Embedding Models
  • Vector Stores
  • Memory
  • Chains
  • Agents


You can easily pull and run LLAMA-3 by using OLLAMA. After installing OLLAMA, you just need to run ollama serve llama-3. It will be accessible on your local by using port 11434.

You also should create a virtualenv and install the libraries below.

# I use Python3.11 on my setup
pip install langchain langchain-community langchain-huggingface langchain-text-splitters pypdf faiss-cpu

First Interaction on Local

After installing the requirements and running LLAMA-3 by OLLAMA, you can run the script below.

from langchain.schema import SystemMessage, HumanMessage
from langchain_community.chat_models import ChatOllama

chat_model = ChatOllama(model="llama3")

messages = [
content="You are a helpful assistant that help he user to plan an optimized itinerary."
content="I want to go to Istanbul. Please give me a list of local foods to try."

output = chat_model(messages)


It will give you something like that;

# Obviously this can be changed on your local,
# It should be helpful to remember that these models are stochastic
Istanbul, the culinary heaven! You're in for a treat! Here's a list of local foods you absolutely must try when visiting Istanbul:

1. **Doner Kebab**: Thinly sliced lamb or beef cooked on a vertical spit and served in a crispy sesame-topped bun with salad, vegetables, and spices.
2. **Lahmacun** (Turkish Pizza): A thin crust topped with minced meat, onions, and spices, often served with lemon juice and herbs.
3. **Baklava**: Layered pastry filled with honey, nuts, and spices. A classic Turkish dessert that's sweet and indulgent.
4. **Dolma**: Stuffed vegetables (such as bell peppers, eggplants, or zucchini) with a mixture of rice, herbs, and spices.
5. **Menemen**: A Turkish-style omelette filled with onions, tomatoes, and spices, often served with bread or pita.
6. **Kebabs** (Köfte): Meatballs made from ground meat, breadcrumbs, and spices, grilled to perfection and often served with a side of yogurt sauce.
7. **Simit**: A crusty, sesame-topped bread that's perfect for snacking or serving as a base for other dishes.
8. **Manti** (Turkish Ravioli): Steamed dumplings filled with meat and onions, topped with yogurt, garlic, and paprika.
9. **Kadaif**: Shredded phyllo pastry filled with cheese, nuts, or meat, served with syrup or honey.
10. **Salep**: A traditional winter dessert made from ground orchid root, milk, sugar, and spices. You might find it at local cafes or bakeries.

Some popular eateries and markets to explore:

* **Mısır Çarşısı** (Eggs Market): Try some street food and local specialties like grilled corn, egg sandwiches, and kebabs.
* **Çiya Sofrası**: A cozy eatery serving traditional Turkish dishes with a focus on organic and locally sourced ingredients.
* **Karaköy Güllüoğlu**: This famous pastry shop has been around since 1895! Try their iconic baklava or other sweet treats.

Remember to also try some of the local drinks, like:

* **Ayran** (Yogurt Drink): A refreshing yogurt-based beverage with salt and water.
* **Şarap** (Turkish Wine): Enjoy a glass at a local winery or restaurant.
* **Kahve** (Turkish Coffee): Strong and rich coffee made from finely ground coffee beans.

Enjoy your culinary adventure in Istanbul!

Adding Memory

It was really nice to play with one of the powerful toys of the age on our locals. Despite it is powerful, there is something that we should point out; it has no memory and can’t recall what we talked! Just think a moment that my AI darling can’t recall what said to me last night; such a worst relationship…

You can choose one of the tools below to add memory to your solution.

We are going to use ConversationBufferMemory which is plain vanilla memory type in LangChain. It holds our chat history in a variable.

Now we can use our chat history by running the script below. We created our first Conversation Chain.

from langchain_community.chat_models import ChatOllama
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

chat_model = ChatOllama(model="llama3")
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=chat_model, memory=memory,
verbose=True) # we can

while True:
query = input("You: ")
if query == "q":
print("Goodbye! It was nice to chat with you.")
output = conversation({"input": query})
print(f"User: {query}")
print(f"Bot: {output['response']}")
User: I am planning to go to Beirut. Please suggest me some local food to try. I also like Pakistani food.
Bot: Beirut! What a vibrant city! You're in for a treat when it comes to trying the local cuisine. Lebanese food is a fusion of Mediterranean, Arabic, and Middle Eastern flavors, and there are plenty of dishes that you might enjoy........<OTHER OUTPUTS>
User: Why you suggested me Mujaddara?
Bot: I suggested Mujaddara because, as I mentioned earlier, it's a dish that shares similarities with Pakistani cuisine! In fact, Mujaddara is a popular comfort food in many Middle Eastern countries, including Pakistan. The combination of sautéed onions and lentils with rice or bulgur is a classic flavor profile that you might find familiar. Plus, the hearty, comforting nature of the dish makes it a great option for a satisfying meal. I thought it was worth highlighting the connection between Lebanese and Pakistani cuisines, especially since you mentioned enjoying Pakistani food!

It says that it suggested me Mujaddara because of I said that I like Pakistani food.


Now it’s time to inject our external data that is not exists in the training corpus of the llm. The basic workflow is shown below. To keep it simple, I’ll not use external database to store my embeddings.

Figure 5.2: Incorporating user-specific knowledge into LLMs (source:

You definitely should read about RAG (Retrieval Augmented Generation) if you haven’t. The main concept is so similar with what we did in this section.

  • Parse the documents by chunks
  • Transform the chunks to embeddings
  • Store the embeddings
  • Retrieve the K embeddings that are the most similar to the query given by user
  • Generate a response by using the retrieved chunks
Step-1 of RAG | Image by LangChain
Step-2 of RAG | Image by LangChain

I downloaded the book Effective Python: 90 Specific Ways to Write Better Python, 2nd Edition as a PDF on my local. I parsed it by using PyPDFLoader from langchain_community.document_loaders then I transformed the chunks to embeddings by using OllamaEmbeddings and stored (assigned in our case) them by FAISS.from_documents.

Now, I can ask how to write effective Python code to my llm by running the script below.

from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from datetime import datetime

start_time =

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
raw_documents = PyPDFLoader("./resources/pdf/effective_python.pdf").load()
documents = text_splitter.split_documents(raw_documents)
db = FAISS.from_documents(documents, OllamaEmbeddings(model="llama3"))
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatOllama(model="llama3")
qa_chain = ConversationalRetrievalChain.from_llm(
llm, retriever=db.as_retriever(), memory=memory, verbose=True
wip_time_in_seconds = ( - start_time).total_seconds()
print(f"Time taken: {wip_time_in_seconds} seconds")
while True:
query = input("You: ")
if query == "q":
print("Goodbye! It was nice to chat with you.")
output ={"question": query})
print(f"User: {query}")
print(f"Bot: {output}")
User: Why do I need to use Walrus operator and what does it do?
Bot: According to the provided context, assignment expressions (also known as the Walrus operator) use the `:=` operator to both assign and evaluate variable names in a single expression, thus reducing repetition.

In other words, the Walrus operator allows you to perform an assignment within a larger expression, which can improve readability by avoiding repeated assignments. This feature is particularly useful when you find yourself repeating the same assignment multiple times within a grouping of lines.

User: Give me some example please.
Bot: According to the provided context, assignment expressions (using the walrus operator :=) can be used to reduce repetition. Here's an example:

Instead of writing:
if x := y // 2 > 0:

You could write:
while (x := y // 2) > 0:
In this case, the walrus operator is used to assign the result of `y // 2` to `x` and evaluate whether the condition is true

User: Generate an example of usage of walrus operator in if-else block.
Bot: You can use the walrus operator (:=) in an if-else block to assign and evaluate a variable within a single expression. Here is an example:

if (new_value := input("Enter a number:")):
print(f"The entered value is {new_value}.")
print("No value was entered.")

In this code, the `input()` function asks for user input and assigns it to the variable `new_value`. The if statement then checks if `new_value` is truthy (i.e., not zero, an empty string, or None). If the input is valid, the program prints a message with the entered value. Otherwise, it prints a message stating that no value was entered.

This code demonstrates how you can use the walrus operator in an if-else block to simplify your code and improve readability.

It generated the outputs by using our Effective Python book, fantastic!


Hopefully, It was helpful for having a birds eye look about the main tools and concepts of custom LLMs. I’ll be writing more about these topics in coming days.

You can find my links below to follow me on other platforms.

Kind regards



Writer for

Lifelong learner & Developer. I use technology that helps me.