Vertex AI Conversation for Wikipedia PDF Chatbot

WikiBot CX: Wiki Informational Bot using Dialogflow CX

Aniket Agrawal
Google Cloud - Community
4 min readOct 29, 2023

--

In this beginner-friendly blog, we embark on a journey into the creation of a Wikipedia PDF chatbot powered by Google Vertex AI Search and Conversation (S&C) (previously Gen App Builder). We explore its intricacies, showcasing how it can be leveraged to craft a conversational curator capable of delivering informative and engaging experiences.

Google Cloud Conversational AI Tech (Vertex AI, Dialogflow CX)

In the realm of knowledge, Wikipedia stands as a towering titan, housing a vast repository of information spanning countless topics. However, delving into its depths can be a daunting task, especially for those seeking specific information buried within PDF documents. Hence, we leverage the chatbot, a conversational marvel capable of bridging the gap between humans and information. By harnessing the powers of Vertex AI Conversation, users can engage in meaningful conversations, providing them with tailored responses to their queries.

Download a sample Wikipedia PDF to create the unstructured datastore for the Vertex AI S&C chatbot.

Our chatbot can answer questions about a wide range of topics related to Indian tourism, drawing on the vast wealth of information contained within our example Wikipedia PDF, ‘Tourism in India’. The answer content is simply extracted from the relevant indexed PDF text, table data, image captions, etc., as shown in the below example GIFs:

Sample chat answer results for ‘Tourism in India’ Wiki PDF (Unstructured Datastore)

Procedure

Till now, we have seen the objective and the chatbot GIFs. But how do we achieve it? This section discusses the technical and product details.

Vertex AI Conversation is a managed service that enables us to build and deploy chat solutions powered by Google Conversational AI. It grants our chatbot access to the vast trove of information contained within the Wikipedia PDF datastore on Indian tourism.

As a prerequisite for datastore creation, we’ll first create a cloud storage bucket and upload the PDF file as follows:

Uploading Wiki PDF in a GCS Bucket: Steps 1 (Left) and 2 (Right)
Uploading Wiki PDF in a GCS Bucket: Steps 3 (Left) and 4 (Right)

At the heart of Vertex AI S&C lies Dialogflow CX, a natural language understanding (NLU) platform that enables us to design and build conversational interfaces. Once Dialogflow CX understands the user’s query and Vertex AI S&C has retrieved relevant information from the Wikipedia PDF, it is up to Conversational AI to weave together these components into a seamless and engaging conversational experience.

Please follow up with the below self-explanatory GIF, which explains the entire procedure in the form of GCP console screenshots:

Entire procedural journey GIF (left) and the chatbot itself (right)

Summary

The Vertex AI S&C chatbot emerges as a valuable ally in the pursuit of knowledge, capable of transforming the arduous task of information retrieval from long PDFs into a seamless and engaging dialogue. Stay curious, adios, and see you soon with another blog with a chatbot with much more advanced features!

Note: Should you have any concerns or queries about this post or my implementation, please feel free to connect with me on LinkedIn! Thanks!

Reference Links

Google Cloud Next ’23: Vertex AI S&C is now GA, hurray!

Feature release + renaming happen for a new product; keep an eye on this:

For more hands-on practise, you can try this codelab to learn more:

--

--

Aniket Agrawal
Google Cloud - Community

AI/ML | Cloud Engineer at Google, GenAI | Cybersecurity | ML | NLP | Image Processing Research Enthusiast https://www.linkedin.com/in/aniket-agrawal-a18990266/