Create ChatGPT CLI Bot to Make Q&A on PDFs Using knowledge-gpt

Published in

Geeks-of-Data

4 min readApr 8, 2023

Before starting the article, I want to mention our “Geeks of Data” Discord channel. You can join and say hello, and exchange ideas about data science, engineering, or analysis fields.🚀 Link

In today’s world, information is just a few clicks away. With the vast amount of data available on the internet, it’s no wonder that people turn to search engines to find answers to their questions. But what if we could automate this process and make it more efficient?

That’s where knowledgegpt comes in. It’s a library designed to gather information from various sources and create prompts that can be used by OpenAI’s GPT-3 model to generate answers. With this library, you can build a chatbot that can answer questions on any topic.

Today we’ll try to explain to you how to create a CLI bot using knowledgegpt with just a couple of lines, our example covers the case when you either have the target PDF available in your local or when it’s an arxiv paper online waiting to be downloaded.

We can start with the installation, you can either use the command

pip install knowledgegpt

to install from the PyPI address or you can visit our Github address to install from the source.

Once everything is set we can create a Python file and then start importing the necessary libraries and configs.

The only thing that doesn’t come from our installment step here is the SECRET_KEY, to access it you can create an account from OpenAI get an API key, and place it under a file called “local_example_config” into a variable called SECRET_KEY.

Then into a function called main() we can add this code, this part here basically reads in the pdf name or the arxiv id, sets up a condition for quitting, takes in the index_path ( is only needed when you want to save your indexes after the initial calculation ), also the load_index variable is used to check if we are loading the index in our extract method ( that we will see in a minute )

The code above is pretty trivial, if the file is given directly, then go use it, if not, download from arxiv and use it.

Then we set our max context length and finally initialize our class, we want a paragraph-based extraction, an English model, and we want to use the turbo model for our main engine :) (turbo is the one behind chatgpt)

If the context we encapsulated at the beginning is not good enough we can reset it by using the key restrat_context in our message, then the message sent along with it is going to be used to create another context.

Then if it’s our initial load we make 1 type of query if not the other one, the difference here is the in the initial query we’ll certainly have a fresh context so no need to call the context_restarter.

Finally, we print our answer and the loop goes on until we break it, now let’s take a look at a real use case, below we download a pdf from arxiv, first, we give the file name or the link then the path, if the path is empty, knowledgegpt will calculate and fill if is full already it’ll use directly. Then we specify the max # of tokens to then finally enter our prompt.

Finally, we have our answer ready, we can keep asking if we have questions related to VOS’s internals and how it works, if not we can use the flag restart_context and ask another question to update the context. Below we see our answer.

Our Answer

That mostly concludes what we were aiming to cover today, however, if you need the source code please see the link to find the source code.

Okay, that’s pretty much it. Thank you very much for reading and following along, friends. If you want to access content like this and spend time with curious, intelligent, and hardworking colleagues, we also welcome you to our Discord server. 🚀 Link

Create ChatGPT CLI Bot to Make Q&A on PDFs Using knowledge-gpt

Written by Eren Akbulut