Chat with your documentation
Have you ever tried to find information across a large amount of documentation? It can be quite difficult, and we often rely on the search function to pinpoint what we’re searching for.
In our company, we maintain an internal wiki called wellD book, built with docusaurus. It serves as a central repository where we record tools, best practices, and guidelines that we should always have at hand.
In addition to that, our book also covers administrative processes such as how to request a day off or what the remote work policy is at wellD. However, many people tend to forget about the existence of the book and end up asking the same questions to our HR department.
Last but not least, we use Discord as a communication tool (chat) where employees can chat with each other and exchange information.
We are currently in the ChatGPT era, so there must be a way to connect all of these resources and streamline our processes, reducing repetitive tasks and making our lives easier!
OpenAI + Supabase = 7docs
I came across this amazing tool called 7docs which utilizes the OpenAI APIs to convert various file formats into text chunks and subsequently into embeddings (vectors) using the text-embedding-ada-002
model. Once we have the vectors, we store them in a vector database. 7docs offers integration with supabase, an alternative to Firebase based on PostgreSQL, which includes the pg_vector extension.
7docs consists of two sub-projects. The first one is the “cli” used for ingesting documentation, and the second one is used for querying the dataset previously stored. But how does it extract information from the vector table based on a user’s question? Well, first, 7docs converts the question into a vector. Then, it measures the distances between this vector and the other vectors, returning the vectors that are more “similar” or closer to the question vector. The default similarity threshold is set to 0.78
.
Once we have the question vector and some possible answers, we use the OpenAI gpt-3.5-turbo
model to generate a coherent answer for the user.
The diagram below attempts to describe the functionality of our system upon completion.
🤖 Bot for the simpleness
It seems that all the pieces are falling into place. We just need to write a Discord bot that will allow my colleagues to query our documentation. Discord offers excellent documentation (https://discordjs.guide/) that explains how to build a bot. It’s quite easy to do. We already have a Discord bot made with Bun (https://bun.sh/), but that’s another story or perhaps a future blog post. For now, we just need to add a new command that takes in a question as input and returns a response processed by the logic described above.
The result is quite impressive. I asked the bot if there was a way to sync IDE settings across laptops, and that’s the answer it provided!
AI retrieves information from this section of our internal wiki: https://book.welld.io/docs/tools/ide
On a side note, I initially thought it would be easy to instruct OpenAI to respond to the question only with the provided context. Otherwise, it should have responded with a “sorry” message. However, I ended up spending more time than planned. I had to write and rewrite my prompt several times to find the best fit for my use case. Additionally, I had to introduce a circuit break for cases where the SQL query does not extract any vectors from the database.
Automate, automate everything!
I must confess that I enjoy automating processes as much as possible. To achieve this, I created a Docker image that includes the 7docs CLI, which I utilized in the CI/CD pipeline for our wellD book site.
So, here’s the bit of the Dockerfile
that I whipped up:
FROM node:18.18 as node
RUN echo "NODE Version:" && node --version
RUN echo "NPM Version:" && npm --version
FROM python:3.11.4
COPY --from=node . .
# Due to this issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2750
# we need to use the root user to install 7-docs.
USER root
RUN npm install --global 7-docs@0.5.1
Whenever a commit is pushed to the master branch and it includes at least one added or modified markdown file, the ai:ingestion
job is triggered, updating the vectors on Supabase. This ensures that our Bot's knowledge remains synchronized with the wiki.
Here is the GitLab job definition:
7docs:ingest:
stage: ai
retry: 1
image:
name: <YOUR_DOCKER_REGISTRY>/7docs:0.5.1-node18-v2
entrypoint: [""]
allow_failure: false
cache: {}
needs:
- job: npm:install
optional: true
variables:
NAMESPACE: book-collection
script:
- 7d ingest --files 'docs/**/*.md' --namespace $NAMESPACE
rules:
- if: $7DOCS_DISABLED
when: never
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
- if: $CI_COMMIT_TAG
👣 Next steps
I have several new ideas to enhance this project, but the most important ones are:
- Adding like and dislike buttons under the bot’s answer to track its quality. This feature is extremely useful, as is the next one.
- Storing questions and their corresponding answers in our database to prevent duplicate requests to OpenAI and save some money.
🎬 Conclusion
Hey there, this blog post delves into the challenges we face when searching for information in large documentation. But fear not, because we have an amazing tool to the rescue: 7docs! Powered by the mighty OpenAI APIs and Supabase, 7docs breaks down those hefty files into text chunks and mind-blowing embeddings. And guess what? It uses the power of vector similarity to pinpoint the most relevant information. How cool is that?
But wait, there’s more! We also uncover the need for a Discord bot in our documentation quest. And to make things even more epic, we provide step-by-step instructions on how to automate the integration using Docker and a CI/CD pipeline. It’s like we’re summoning the forces of automation!
Now, what lies ahead on our geeky journey? We suggest adding feedback buttons to gauge the quality of our bot’s answers. And storing questions and answers in a database to save our precious resources.
So, gear up and dive into this mind-bending blog post.