DATA STORIES | LOCAL LLMS & VECTOR STORES | KNIME ANALYTICS PLATFORM

Llama3 and KNIME — Build your local Vector Store from PDFs and other Documents

… it also runs on your KNIME 4 installation — and on Python

Markus Lauber
Low Code for Data Science

--

Recently I wrote about how to utilize the Llama3 large language model by Meta with the help of basic KNIME nodes and Ollama on your local machine. This time, I would like to expand on that to also build a vector store from your own PDFs, CSVs or Log files with an LLM and letting it do batch processes — without the content ‘leaking’ to the internet.

“A happy yellow robot and a white llama working together on some old scrolls in the style of Johannes Vermeer. The scene features soft, natural light” (DALL·E).
“A happy yellow robot and a white llama working together on some old scrolls in the style of Johannes Vermeer. The scene features soft, natural light.” (DALL·E).

An LLM Framework in KNIME

Ollama provides an environment to use LLMs where you can access a local server through API calls, use Python code or use your favourite low-code data analytics tool KNIME to integrate the AI functions in your workflows.

To demonstrate the basic usage, I have also set up a repository on GitHub so you can test and toy around with the different ways to load documents in Python before integrating them into KNIME:

Using KNIME 4 Nodes and Python

There are dedicated nodes for the use of local and online LLMs with KNIME 5 and I have also written about the use of GPT4All with local models in other articles:

The developments are moving fast in this area and sometimes KNIME seems to struggle to keep all the new nodes up-to-date. So for the sake of compatibility and stability of usage, I stick with Python nodes (or REST-API) — yes, somewhat defeating the whole low-code thing … but anyway.

Install Python and the necessary packages

We start by setting up a Python environment. Please see this article about KNIME and Python/Conda in general if you are not familiar:

You can use the specific YAML File with all the packages you need. All files will also be in the LLM workflow group.

If you have these basics covered, you should now have the environment py3_knime_llama working and we can continue.

Install Ollama and Llama3 — and an embedding model

You will now have to install Ollama and Llama3 (or any other model supported) and also an embedding model to bring your documents into a vector store.

If you are behind a company firewall, you might have to set the proxies (in the terminal), kill all running Ollama instances (in the task manager) and try again.

set HTTP_PROXY=http://proxy.my-company.com:8080
set HTTPS_PROXY=http://proxy.my-company.com:8080

ollama run llama3:instruct
ollama pull llama3:instruct

Success will look something like this (yes, here we have mistral model):

Terminal window with the download of llama3 files

The last thing you need is an embedding model to convert your documents. You can use Llama3 but it might be faster if you use, for example, the new “mxbai-embed-large” or the popular “all-MiniLM-L6-v2” (you can download this in Python or you let this code run once in KNIME). You will only need a possible proxy once to download models. Later you could even deactivate your WiFi. Fastest way may be to let Ollama pull a model and just use that in your code.

ollama pull mxbai-embed-large

Find your PDFs and turn them into a Vector Store

Now that you are all set up, you can start to list your PDF files (other examples come later) and start turning them into a vector store that will be stored in a local SQLite database. You will only have to do this once and then you can use the persisted vector store. In this case, we use the PDF from a coffee machine manual.

KNIME Workflow with Python code creating the Vector Store from PDFs
KNIME Workflow with Python code creating the Vector Store from PDFs (https://hub.knime.com/-/spaces/-/~5s39Yth4NbkUIj0q/current-state/).

One interesting setting is the chunk_size and chunk_overlap. The text in the PDF will be split into chunks of 500 characters and there will be an overlap of 50 — so that a context might not get lost between the pieces.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, 
chunk_overlap=50)

Later we can tell the model how many chunks/documents to find that might match our question. The strategies pursued here will influence the performance of our approach.

In the workflow, there are two ways to create the vector store. You can use a standard embedding model or you could use the Llama3 model itself. On Apple Silicon this is quite fast.

You can now chat with your PDFs

To a certain extent still amazing but now also quite standard procedure — you can now chat locally with your (PDF) documents using the newly created vector store — and this also does work with KNIME 4.7:

a KNIME component to chat with Llama3 model
Chat with your PDFs — retrieving 3 additional documents from the Vector Store. Also give the model instructions (https://hub.knime.com/-/spaces/-/~5s39Yth4NbkUIj0q/current-state/).

The results of your chat will be stored in a local KNIME table for further use and reference.

Use the Vector Store in Batch mode

Maybe more interesting than chat is the option to let KNIME answer several questions at once in a loop you can run on your local machine. You can construct a prompt made from (fixed) instructions and a series of questions that will then be run through the LLM model. Also the text will be escaped so to be better handled by KNIME and Python.

A KNIME workflow sending a batch of questions to a Python node (using LLama3)
Let KNIME handle the loop of your questions sent to the LLM in a batch (https://hub.knime.com/-/spaces/-/~5s39Yth4NbkUIj0q/current-state/).

So you might even be able to process a fair amount of data without sending the information through the internet. I have seen quite good performances on an Apple Silicon machine.

Use other Document Loaders

You can also use other document loaders to work with your files:

In the KNIME Workflow, I have included an example of loading a list of Log files in a vector store — each of them as a whole document (UnstructuredFileLoader). Here to let Llama3 suggest a JSON structure to extract from such logs (cf. also “Chat with local Llama 3 Model via Ollama in KNIME Analytics Platform — Also extract Logs into structured JSON Files).

We use the vector store and the Llama3 model to provide a sample of data and to let the LLM write the prompt for us including the JSON structure. For a loop one could then try to skip the vector store and just use the prompt and instructions.

Use Llama3 to let it write a prompt for itself to extract information from your Log files into a JSON structure
Use Llama3 to let it write a prompt for itself to extract information from your Log files into a JSON structure (https://hub.knime.com/-/spaces/-/~5s39Yth4NbkUIj0q/current-state/).

CSV to Vector Store — and Chat with your Database

The last example in the workflow is to load data from CSV files that belong to a relational database (the famous Northwind MS Access). Each line will become a document and you can then ask questions like “are there customers from Austria” in the DB. You can also ask for specific customer numbers and check them.

Aks a Llama3 window if there is someone from Austria in the database
Ask the Llama3 if it can find anyone from Austria in your database (https://hub.knime.com/-/spaces/-/~5s39Yth4NbkUIj0q/current-state/).

Please note: depending on how many documents you allow to be retrieved the answers may differ. This feature would be there to explore your data and maybe ask specific questions about single entries (give the model a customerid or something to identify a data row) . This will *not* provide you a complete overview of the database.

So you might be able to ask the model what a specific customer has purchased but not how many purchases were made overall.

You might use this to let the model write SQL code for you. There are example of LLMs being used to query relational databases with natural language when you provide them with a schema of your DB.

One LLM to explore to query databases might be the “sqlcoder”. Might be a good idea for another article …

My take is that you can use this local setup to toy around with your data on you local machine. You will have to think about what kind of data to load and what number of documents might be suitable to get an answer. You might also be able to build a prototype to extract data in a systematic way (think JSON).

More to read on KNIME and LLMs …

If you cannot get enough of AI (hype) …

Chat with local Llama 3 Model via Ollama in KNIME Analytics Platform
Also extract Logs into structured JSON Files
https://medium.com/p/aca61e4a690a

Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All
https://medium.com/p/311bf61dd20e

KNIME, AI Extension and local Large Language Models (LLM)
https://medium.com/p/cef650fc142b

[2024–07–11] This article has been edited and amended to reflect some changes and include new options for embeddings and the handling of CSV files

A side note: to automate the management of Python environments between Windows and macOS. I have introduce a component that will detect your OS and then load the necessary Conda Environment Propagation node:

The component will choose the right conda environment for your operating system
The component will choose the right conda environment for your operating system

If you enjoyed this article you can follow me on Medium (https://medium.com/@mlxl) and on the KNIME forum (https://forum.knime.com/u/mlauber71/summary) and hub (https://hub.knime.com/mlauber71).

--

--

Markus Lauber
Low Code for Data Science

Senior Data Scientist working with KNIME, Python, R and Big Data Systems in the telco industry