Create your Chat GPT Plugin with Custom knowledge base

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

9 min readSep 21, 2023

Build your own Chat GPT Plugin which is able to read external Wiki-page and speak with you based on the context retrieved from the custom data

Photo by Emiliano Vittoriosi on Unsplash

If you are my constant reader you probably have already read another article about Chat GPT Plugin development “Build your own Chat GPT Plugin” and you learned to this moment how to build Plugin for well-known Chatbot from scratch.

ADVERTISEMENT: If you or someone you know is interested in collaborating with an expert in AI and Chatbot development, my friends from cosmith.io and I offer top-notch services for Chat GPT Plugin development and integrations.

Last time our Plugin accessed Internet-service to provide customers with Weather forecast. This time, I am willing to give to Chat GPT an ability to read an article from Wikipedia and keep the conversation with customer based on that external data context.

Why do we may need the External data context?

The OpenAI Model has a knowledge cutoff (its last update was in September 2021). If we need information or context after this date, external data can provide the model with the most recent events or news. Another reason is in using information which is not publicly available. For example: some private database or files.

In our case we will just limit the private user context to the specific web-page on Wikipedia.

Architecture

The major component of our Plugin is Llama_index (llamaindex.ai) which is Data framework for LLM (Large language Model) applications. It lets us create the Custom Knowledge Base, some kind of Data Context which can be used by Chat GPT Model to answer the customer questions.

Generally speaking, Llama_index creates special index files on the localhost which then is being used by Chat GPT Model to operate with your data.

With this tool you are able to use different Data Providers, beginning from PDF and ending with Database. In our case we will be using SimpleWebPageReader to read and create index from simple web-page on Wikipedia (https://wikipedia.org).

The dialogue with the user consists of two parts.

Part 1. Feeding Plugin with the Wiki-page Url

Initiating the dialogue, customer has to provide our Plugin with Wili-page Url which is being used to download web-page and create index for LLM. Then it stores in the Plugin memory.

Part 2. Answering from Custom Context (Llama_index)

On the second step, customer asks any question related to the data context which is downloaded from the Wiki-page. Plugin’s code extracts it from the Llama_index and returns to the Chat GPT UI.

Tools and libraries

Regardless of the fact the Open AI offers to use Python for Plugin-development in their official Documentation, earlier we were free to choose any language and tool to develop our own simple plugin. However, our case is not that simple and this time we have to use just only Python for our development. It is because of the dependency we have in here. Its Llama_index.

I am willing to use for the purposes of our experiment:

Visual Studio Code
Python
Quart microframework
Llama_index
General packages — os, json and uuid

Implementation

This time I am gonna consider the solution not from the prospective of structure but from functionality.

First of all, in order to let our code work with Chat GPT API we have to create and set up an appropriate variable with OPENAI_API_KEY. How to do that, — look into the official page.

OPENAI_API_KEY = '[OPENAI_API_KEY]'
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Then, first endpoint ‘@app.post(“/post-wiki-page”)’ is supposed to retrieve from the customer Wiki-page Url, download the page, launch the function ‘post_wiki(wiki_page_url)’ to create the Llama_index for the Web-page and store it in the array of objects wiki_pages with key wiki_id for the further using.

wiki_pages = {}

@app.post("/post-wiki-page")
async def post_wiki_page():
    request = await quart.request.get_json(force=True)
    wiki_page_url = request["wiki_page_url"]
    wiki_id, res = post_wiki(wiki_page_url)
    if res is not None:
        wiki_pages[wiki_id] = res
    else:
        return quart.Response(response=json.dumps("Not found"), status=404)
    answer = { 
        "wiki_page_id" : wiki_id
    }
    return quart.Response(response=json.dumps(answer), status=200)

The ‘post_wiki(url)’ function is definded for downloading the Wiki-page and creating the Llama_index based on that Custom Context.

def post_wiki(url):
  try:
    wiki_page_id = uuid.uuid4().hex
    wiki_page_url = url
    pages = SimpleWebPageReader(html_to_text=True).load_data([url])
    index = GPTListIndex.from_documents(pages)
    return wiki_page_id, index
  except:
    return None

As you may see in the listing above, it uses SimpleWebPageReader and GPTListIndex components to do its work.

The second important part is the function ‘get_talk_to_wiki(wiki_page_id, question)’ which is intended to answer customer’s questions regarding the data from Wiki-page which we downloaded and prepared on the First-step.

@app.get("/talk-to-wiki/<string:wiki_page_id>/<string:question>")
async def get_talk_to_wiki(wiki_page_id, question):
    try:
        if wiki_page_id in wiki_pages:
            R = get_answer(wiki_pages[wiki_page_id], question)
        else:
            R = "Answer is not available"
    except:
        R = "The plugin 'Talk to Wiki page' is in error."
    return quart.Response(response=json.dumps(R), status=200)

def get_answer(index, question):
  query_engine = index.as_query_engine()
  res = query_engine.query(question)
  return res.response

As you may see from the listing above, it is just querying question from the Llama_index stored in the wiki_pages and returning it as the Response to the Chat GPT UI.

main.py

For consistency I am placing the whole script of our program, which is written in ‘main.py’ here to exclude any discrepancies.

import os
import json
import uuid

import quart
import quart_cors
from quart import request

from llama_index import GPTListIndex, SimpleWebPageReader

app = quart_cors.cors(quart.Quart(__name__), 
        allow_origin="https://chat.openai.com")

OPENAI_API_KEY = '[OPENAI_API_KEY]'
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

wiki_pages = {}

@app.post("/post-wiki-page")
async def post_wiki_page():
    request = await quart.request.get_json(force=True)
    wiki_page_url = request["wiki_page_url"]
    wiki_id, res = post_wiki(wiki_page_url)
    if res is not None:
        wiki_pages[wiki_id] = res
    else:
        return quart.Response(response=json.dumps("Not found"), status=404)
    answer = { 
        "wiki_page_id" : wiki_id
    }
    return quart.Response(response=json.dumps(answer), status=200)

@app.get("/talk-to-wiki/<string:wiki_page_id>/<string:question>")
async def get_talk_to_wiki(wiki_page_id, question):
    try:
        if wiki_page_id in wiki_pages:
            R = get_answer(wiki_pages[wiki_page_id], question)
        else:
            R = "Answer is not available"
    except:
        R = "The plugin 'Talk to Wiki page' is in error."
    return quart.Response(response=json.dumps(R), status=200)

@app.get("/logo.png")
async def plugin_logo():
    filename = 'logo.png'
    return await quart.send_file(filename, mimetype='image/png')

@app.get("/.well-known/ai-plugin.json")
async def plugin_manifest():
    host = request.headers['Host']
    with open("./.well-known/ai-plugin.json") as f:
        text = f.read()
        return quart.Response(text, mimetype="text/json")

@app.get("/openapi.yaml")
async def openapi_spec():
    host = request.headers['Host']
    with open("openapi.yaml") as f:
        text = f.read()
        return quart.Response(text, mimetype="text/yaml")
    
def post_wiki(url):
  try:
    wiki_page_id = uuid.uuid4().hex
    wiki_page_url = url
    pages = SimpleWebPageReader(html_to_text=True).load_data([url])
    index = GPTListIndex.from_documents(pages)
    return wiki_page_id, index
  except:
    return None

def get_answer(index, question):
  query_engine = index.as_query_engine()
  res = query_engine.query(question)
  return res.response

def main():
    app.run(debug=True, host="0.0.0.0", port=5001)

if __name__ == "__main__":
    main()

As you may see from the listing above, we are using llama_index package and variable OPENAI_API_KEY which should be setup properly to valid API Key from OpenAI. Also please note that we are running our quart web-service on port 5001 this time.

openapi.yaml

As you may know, this file is the service-endpoint definition. Model in ChatGPT discovers this OpenAPI YAML-specification during the installation process to set up Plugin in an appropriate way.

openapi: 3.0.1
info:
  title: Talk to Wiki Plugin
  description: A plugin that allows the user to talk with Wiki-page using ChatGPT and llama-index.
  version: 'v1'
servers:
  - url: http://localhost:5001
paths:
  /post-wiki-page:
    post:
      operationId: postWikiPage
      summary: Post the Wiki-page from the public link.
      requestBody:
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/PostRequest"
        required: true
      responses:
        "200":
          description: Success
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/PostResponse"
  /talk-to-wiki/{wiki_page_id}/{question}:
    get:
      operationId: getTalkToWiki
      summary: Talk to the data from the Wiki page.
      parameters:
      - in: path
        name: wiki_page_id
        schema:
            type: string
        required: true
        description: The Wiki-page identifier.
      - in: path
        name: question
        schema:
            type: string
        required: true
        description: The question to the document.
      responses:
        "200":
          description: OK

components:
  schemas:
    PostRequest:
      type: object
      required:
      - wiki_page_url
      properties:
        wiki_page_url:
          type: string
          description: The public Url to the Wiki-page.
          required: true
    PostResponse:
      title: PostResponse
      required:
        - wiki_page_id
      type: object
      properties:
        wiki_page_id:
          title: wiki_page_id
          type: string

Please note that we describe in the file above not only our two endpoints (post-wiki-page, talk-to-wiki) but also the data schemas for them (PostRequest and PostResponse).

ai-plugin.json

Don’t forget to properly fulfill the Plugin manifest inside the folder ‘well-known’. Chat GPT is using this file to let the model know which purposes our Plugin can be used for.

{
  "schema_version": "v1",
  "name_for_human": "Wiki-page Talker",
  "name_for_model": "Wiki_page_talker",
  "description_for_human": "Plugin for conversations with data from Wiki-page.",
  "description_for_model": "Plugin for conversations with data from Wiki-page.",
  "auth": {
    "type": "none"
  },
  "api": {
    "type": "openapi",
    "url": "http://localhost:5001/openapi.yaml",
    "is_user_authenticated": false
  },
  "logo_url": "http://localhost:5001/logo.png",
  "contact_email": "misterd793@gmail.com",
  "legal_info_url": "https://medium.com/@dmytrosazonov"
  }

Check the URLs for logo and API in the file above. Don’t forget to check the existence of logo.png because it might cause an issue during the installation process if file does not exist. As you may see, we are using port 5001 for the Plugin-service.

Requirements

Also, double check a file ‘requirements.txt’. It should have the following code inside.

quart
quart-cors
llama-index
html2text

Use the following command to install the dependencies:

pip install -r requirements.txt

Launch

Ok, I think, we are ready to run our program in the Visual Studio Code now. To do so we have to execute the following command:

python main.py

If all dependencies are installed and program is ready for launch we will see the following listing:

Running the application on the Localhost

So now we can start using our Wiki-page Talker Plugin.

Installation

Before we start using Plugin, we have to install it into Chat GPT UI. To do that, go to the Plugin store and click the button ‘Develop your own Plugin’. In pop-up window type http://localhost:5001 as shown on the picture below and click ‘Find manifest file’.

Find manifest file from http://localhost:5001

If the file is discovered you will see another window as shown below.

Just proceed by clicking ‘Install localhost plugin’ button.

Double check that our Wiki-page Talker Plugin is enabled as shown on the picture above.

Ready to use

To test out our new Plugin properly I will use the well-known information that happened after September 2021. As you know, Chat GPT has the limitation in its Dataset up to this date.

On the first step Chat GPT suggests us to provide the Wiki-page which we want to talk with. I have provided the link URL to Wiki: https://en.wikipedia.org/wiki/Kakhovka_Dam

On this step, on the code behind, our Plugin just downloads the Page, creates the Index using Llama_index and stores the result in the memory. At the same time Chat GPT informs us that it is ready to talk about the information included into the Wiki-page.

Phase 2. Asking the question from Wiki-page

On the second step, I ask Chat GPT to answer the question ‘When the Kakhovka Dam was destroyed and why?’.

Chat GPT doesn’t know anything about this event because it happened after 2021, but as we are using Custom Context which is limited to the particular Wiki-page, Model extracts this information easily from the Index and returns to us in the Dialogue above. So that it works as designed :)

Instead of conclusion

In this article we explored the integration of an Custom Data Context, specifically Wikipedia, into the Chat GPT Plugin to overcome its post-September 2021 knowledge limitation.

Using the Llama_index framework, we demonstrated how to create a custom knowledge base, enabling Chat GPT to talk with up-to-date information from a specific Wikipedia page. Be sure, the same methodology can be applied to databases, PDFs, or any other private data sources. Let your thoughts come true, experiment with them and create amazing Apps.