Using ChatGPT to extract intelligent insights from multiple documents

Max Fifield
5 min readMar 13, 2023

--

Combining the power of Azure Search with ChatGPT allows intelligent analysis of your own text data. Example Repo link: https://github.com/Drop-Database-Cascade/chatGPT-azurefuncs.git

Photo by Jametlene Reskp on Unsplash

Acknowledgements

Full credit goes to the following for co-designing and co-developing this solution:

Background

Not wanting to be left behind by the ChatGPT bandwagon, a group of co-workers and I set out to design and build a bespoke solution that took advantage of ChatGPT’s capabilities.

We settled on a “Document Search Chatbot” that could be used as an augmentative tool to field common questions (FAQ’s) based on the content of several documents.

A financial services team I had worked with in the past, faced a problem where they semi-frequently produced informative documents for the public (reports, fact sheets, circulars) and would receive a large volume of inquiries afterwards. There was a widely held view that 70–80% of these inquiries could be answered by information in a specific document. For members of the public however, it was not always abundantly clear which document would answer their query and where in that document to look. In theory, a chatbot that was intelligent enough to link query to source document could improve the experience significantly for the financial services team and the member of the public.

In addition to exploring a cool use case, I was also particular interested in seeing how a language model like ChatGPT could enhance existing cloud platforms.

How does it work?

To summarise, the Document Search Chatbot uses Azure Search to extract and rank key highlights from a set of text documents based on a user query. This user query and Azure Search results are then passed to OpenAI to be interpreted and formatted into a chat based response.

Here are the high level processes undertaken to produce a response from the Document Search Chatbot:

  1. Documents are uploaded to Azure Blob Storage (these are used to “inform” the Document Search Chatbot).
  2. Azure Search is used to create an index on the uploaded documents.
  3. A Semantic Configuration is created for the index. This dictates how semantic search ranks the fields in the document index when queried.
  4. A user submits a query through the Web Portal i.e “What services are being offered by vendor a?”
  5. The user question triggers an Azure Function request which orchestrates API calls to Azure Search and OpenAI.
  6. The query is passed to Azure Search as a semantic query and returns highlights and extracts of matching documents ranked according to the criteria in the Semantic Configuration (i.e relevance). Refer to Azure Search SDK documentation for more information here.
  7. A ChatGPT query is constructed using the appropriate Search extracts and the initial user query. The ChatGPT Query instructs ChatGPT to respond to the user query using the Search extracts if they are relevant to the user query.
  8. The ChatGPT query is sent and a response is received from OpenAI. Refer to the OpenAI API documentation for more information here.
  9. The user query, Azure Search response and ChatGPT response are logged using Application Insights.
  10. This response is passed to the web portal and served to the user.

What problems does this solution solve?

From an Azure Search standpoint —Azure Search can only return text exactly how it appears in your documents, our solution uses ChatGPT to interpret the response(s) from Azure Search and tailor a response to a user question whilst considering whether an appropriate answer to the question exists in your documents.

From a ChatGPT standpoint — our solution provides a mechanism to use the power of ChatGPT over a large bespoke set of text documents without being constrained by token limits.

What are the limitations?

  • Accuracy of responses: While the Document Search Chatbot can analyse many disparate documents and provide quick answers to straightforward questions, there is no way to guarantee 100% accuracy.
  • Data privacy and security: The Document Search Chatbot is best used to analyse publicly available documents, but it’s important to ensure that there is a process in place to avoid confidential documents being inadvertently shared.
  • User experience: The Document Search Chatbot should be designed to provide a positive experience and meet the needs of its users. This may require ongoing user testing and updates to ensure that the Document Search Chatbot consistently handles a wide range of queries.
  • Human involvement: While the Document Search Chatbot is capable of providing quick and accurate responses to questions about content in a set of documents it won’t replace human comprehension for more complicated or critical questions.

How can performance be improved?

The logic we have used for our Document Search Chatbot is quite basic but performance can be improved using the following:

  • Tooling — Addition of open source Chain of Thought (COT) tools such as LangChain to allow for further prompting from the user when the user query provided isn’t sufficient to retrieve extracts from the source documents.
  • Pre-processing — Processing user queries before they are sent to Azure Search, i.e using Chat GPT to reword the user query with some embedded background context to improve Azure Search results.
  • Document Chunking — Adjusting the size of the logical chunks source documents are partitioned by to improve the relevance of the text fragments that are returned by Azure Search. Depending on the source documents you may want smaller text fragments to increase the number of different text sections sent to ChatGPT or you may want longer text fragments to ensure the context of text fragments aren’t missing.
  • Summarisation — Using ChatGPT to summarise the Azure Search response fragments prior to answering the user query may allow more extracts of the source documents to be considered.

Closing Thoughts:

Overall this example shows how Large Language Models (LLMs) can be integrated with managed cloud offerings to provide enhanced capabilities.

As organizations continue to mature their cloud platforms, they can take advantage of ML Ops processes to incorporate existing, pretrained models like ChatGPT and Azure Semantic Search. In my opinion, organizations that do this well, will reap the rewards from highly effective and tailored solutions.

Just prior to the release of this article, Microsoft has released a point of view with code examples for integrating Azure Search with ChatGPT which can be read about here. Furthermore, Azure OpenAI service now allows you to use ChatGPT (preview) in a way that’s suitable for enterprise applications.

Given Microsoft’s relationship with OpenAI, I wouldn’t be surprised if they released an out of the box solution to further ease the integration of Search and OpenAI in the not so distant future.

Documentation and Further Reading Links

Please refer to the following links for further reading and context:

--

--