Vertex AI Search for Wikipedia PDF Search Widget
Enterprise Search for the Wiki PDFs
In the previous blog, we created a Wiki PDF chatbot powered by Google Vertex AI Conversation. Now, on the same PDF, we leverage Google’s enterprise search technology, or Vertex AI Search, for a search engine.
PDFs, as a double-edged sword, preserve formatting beautifully but make their contents frustratingly difficult to search. The popular shortcut ‘CTRL-F’ shortcut comes to mind first for this, probably. But what if you’re dealing with hundreds or thousands of them? Also, you are dependent on the literal presence of the search query keywords, so what about semantic search?
Traditional search engines often struggle with this unique structure, making it frustrating to uncover the knowledge you need. Therefore, a dedicated PDF search engine is required to tackle this challenge head-on by understanding the nuances of PDFs and indexing not only the text but also metadata like titles, authors, and keywords.
It should enable search to be by meaning, not only the 'keywords' and help your ‘query’ find its ‘answer’ needle in the haystack of your documents, websites, structured data, etc.
By leveraging the capabilities of Vertex AI Search, users can conduct meaningful searches that will return tailored search results to their queries from multiple PDFs. However, we deal with a single PDF here for the sake of simplicity and as a beginner-friendly kickstarter.
Our Vertex AI Search engine can answer inquiries regarding Indian tourism in a variety of ways, based on the large amount of knowledge included in our sample Wikipedia PDF, ‘Tourism in India’. The response content is simply retrieved from the necessary indexed PDF text, table data, picture captions, and so forth, as seen in the sample GIFs below.
Procedure
So far, we’ve explored the objective and search engine GIFs. But how do we do it? This section covers technical and product details.
As a requirement for datastore development, we’ll first build a cloud storage bucket and upload the PDF file like this:
Vertex AI Search is a managed service that enables us to create and deploy solutions based on Google Enterprise Search. It gives our search widget access to the large amount of data available in the Wikipedia PDF datastore about Indian tourism.
Let’s see in the below GIF how Vertex AI Search can streamline the overall process (you may have to zoom in by clicking on it!).
Summary
Building a PDF search engine with Vertex AI Search turns a complex data swamp into a well-organized knowledge base. It saves time, unlocks insights, and empowers everyone who interacts with your PDF collections.
Note: Should you have any concerns or queries about this post or my implementation, please feel free to connect with me on LinkedIn! Thanks!
Demo link: https://drive.google.com/file/d/1s8wKMnbkEfi_5jYyxH03D7uIoF4gZsu0/view?usp=sharing
Are you ready to start your quest? Embark on your Vertex AI Search adventure and conquer your PDF chaos using the following links:
Reference Links
Google Cloud Next ’23: Vertex AI S&C is now GA, hurray!
Feature release and renaming happen for a new product; keep an eye on this:
For more hands-on practice, you can try these labs to learn more: