Ollama and Llama3 — A Streamlit App to convert your files into local Vector Stores and chat with them using the latest LLMs
Chatting with the content of your files like PDFs, CSVs or Texts is one of the latest hypes of the LLM wave (just check out all the articles on Medium). To demonstrate how to do this locally with the latest models like Llama3 or Mistral I put together a Streamlit app in Python code to use Ollama to convert PDFs, CSVs and just text documents into Vector Stores to then be fed to the model. I tried Chroma and Meta’s FAISS vector stores.
An Overview about what the Streamlit App does
The interface should be pretty self-explanatory. Providing an instruction and choosing your encoding model (to process the files) and then the LLM itself.
You can choose three types of Files
You can select one or more documents of one type. Keep in mind they will be processed locally and the app is there as a demonstration. So if you intent to crash it you might as well succeed :-)
- PDFs — the text will be extracted. You can opt to chunk the text into portions and also define an overlap in order to give context around a match the chance to be included in both directions
- CSVs — by default they will be imported in a structured way. One row per document. From experience the models can detect that you have several databases for example and make connections between them. You might have to increase the number of documents to be retrieved
- Text — this can be pretty unstructured and you can have them one document per file. This can help to work with Log files
Not everything has to go into a vector store. For CSV files or databases you might just use SQL or let a model write SQL code. Also for Logfiles there might be dedicated log parsers of you can use Regex (or let the LLM write the Regex).
Obviously since this is running on a local machine the power will be limited. Apple Silicon (M1, M2, … processors) seem to do quite well. But then this whole setup is there to keep your data private and not send it to the cloud. If you have access to a powerful LLM like ChatGPT it will most likely beat this little machine.
But the initial results of Mistral or Llama3 are quite encouraging considering you are not using a gigantic park of GPUs but maybe your Laptop’s local CPU.
Save the Vector Stores locally
You will only have to process the files once. The results will be stored in a local SQLite database or other local files. They can be reused later just by giving the path. The app will then take the information and use them. You can choose how many documents at once you want to retrieve. More documents are more information but also more overhead and processing.
Ask questions in the chat
You can ask questions in the chat like you are used to. In the demo there is a strict instruction to only use information from the documents provided. But you can experiment with that. You also have all the information from the LLMs itself ready to be used (all the internet compiled into one file …). Also the chat will be aware of what has happened before. One will have to see how far back this ‘memory’ might last.
Results will be stored in a JSON file
The results will be stored in a JSON file in the vector store’s folder along with additional information like a timestamp, which model was used and some meta information about your document like the name, the page or the line that was used so you can reference it back later or even extract information from the JSON file (again):
You can try and instruct the LLM for example to give back structured results (from a Log file) and then extract that later into a regular table:
So you can just start exploring once you have installed alle the necessary things:
How to set up the Streamlit App
You will find the code and descriptions on GitHub:
https://github.com/ml-score/ollama
- You will need Python installed and best create a conda environment with the packages you need (py_ollama.yml)
- Then install Ollama and make sure you have the model you want downloaded locally — currently there is Llama3 and Mistral (instruct) but you could modify the code to add more
- You will need an embedding model (like mxbai-embed-large). You can use the LLMs themselves but this will take more time
More details on how to do all this and start the app here:
As an example you can check out how to extract information from a bank statement into a JSOn file. You could build a loop around this and bring the information back into KNIME (or any other system):
If you want more AI and LLM I can offer these articles dealing with the low-code analytics platform KNIME and (local) LLMs. KNIME also does support the use of platforms like ChatGPT.
- Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All
https://medium.com/p/311bf61dd20e - KNIME, AI Extension and local Large Language Models (LLM)
https://medium.com/p/cef650fc142b - Llama3 and KNIME — Build your local Vector Store from PDFs and other Documents (also runs on KNIME 4.8)
https://medium.com/p/237eda761c1c - Chat with local Llama 3 Model via Ollama in KNIME Analytics Platform — Also extract Logs into structured JSON Files (also runs on KNIME 4.8)
https://medium.com/p/aca61e4a690a
The examples for all these use cases you can find on my KNIME Hub:
If you want to continue with LLMs and KNIME you can check out the latest KNIME Space for Generative AI:
In case you enjoyed this story you can follow me on Medium (https://medium.com/@mlxl) or on the KNIME Hub (https://hub.knime.com/mlauber71) or KNIME Forum (https://forum.knime.com/u/mlauber71/summary).
This app has been inspired by the writings of Paras Madan here on Medium — make sure to check the articles.