Automating Document Processing with Amazon Textract and LLMs

At GDS, we are aware of the challenges that government agencies encounter in their day-to-day workflow and have been exploring innovative solutions to address them.

One such challenge that government agencies face today is the need to manually process sheer volumes of documents for tasks such as eligibility checks for grants.

The large volumes of documents, the need for accuracy, and the time-consuming nature of manual processing can significantly hinder efficiency and productivity.

What if there is a solution that not only automates the process and performs a first round of triaging but also ensures acceptable accuracy and reliability?

That’s where our document parser comes in.

But first…

Ideation and Planning

The inception of this initiative can be traced back to a water cooler chat amongst a few members of Agile Consulting & Engineering (ACE).

One idea that was brought to the table (or cooler) was leveraging LLMs for document eligibility triaging.

We subsequently found ourselves venturing down the proverbial rabbit hole, where each step revealed new insights and possibilities in the realm of deep learning models.

Warning: Do not stare at this image for an extended period of time.

Trust me when I say that the number of models are staggering.

The Approach

We developed a streamlined approach that encompasses various stages, from document ingestion to data extraction and eligibility triaging.

1. Form Uploader
Users can easily upload their documents through a front-end client or FormSG.

The documents are securely transmitted using encryption protocols and stored on GCC, our Government Commercial Cloud.

2. Serverless Compute Service
AWS Lambda serves as the compute service for our document parser application.

Several benefits of AWS Lambda lie in its scalability, cost-effectiveness, and ease of management.

It scales the number of execution environments automatically based on the incoming workload, which allows the application to handle document processing tasks of any size or complexity.

3. Document Pre-processing using OpenCV
One issue we encountered was inaccurate data extraction when dealing with low-quality scanned documents.

To enhance the legibility of these documents, we used the OpenCV library to first convert them to grayscale, and then use the Canny Edge Detector to detect the visual locations.

Edge detection using OpenCV

4. Document Analysis with Amazon Textract
We were able to extract key textual and structural information from various document types using Amazon Textract.

The service could intelligently recognise key elements such as text, tables, and forms within the documents for data extraction.

Aside from extracting data, Amazon Textract also supports user queries to retrieve specific entity information.

For instance, if a user asks, ”What is the total amount?” from an invoice, Amazon Textract will return the exact information as part of the API response.

This enables the user to quickly access specific details without manually searching through the entire document.

# Sample response from Amazon Textract
{
"Document_Text": [
{
"text": "ABC TECHNOLOGIES PTE LTD",
"confidence": 99.52768087387085
},
{
"text": "123 ABC Street",
"confidence": 66.50803685188293
},
{
"text": "ABC TECH PARK II",
"confidence": 96.37715816497803
},
{
"text": "JOHN DOE",
"confidence": 99.7162401676178
},
{
"text": "$100.50",
"confidence": 97.54421123651128
}
],
"Query_Result": [
"$100.50",
]
}

5. Entity Extraction with Amazon Comprehend
Amazon Comprehend uses natural language processing (NLP) techniques to identify key phrases, entities, sentiments, and other relevant information within the text.

In the case of our document parser, Amazon Comprehend can be used to extract key entities and phrases from the document, such as names, dates, and amounts, which can then be used to populate a database or trigger downstream processes.

# Sample response from Amazon Comprehend
{
"res": [
{
"type": "ORGANIZATION",
"entities": "ABC TECHNOLOGIES PTE LTD",
"score": 0.9952768087387085
},
{
"type": "LOCATION",
"entities": "123 ABC Street",
"score": 0.6650803685188293
},
{
"type": "ORGANIZATION",
"entities": "ABC TECH PARK II",
"score": 0.9637715816497803
},
{
"type": "OTHER",
"entities": "S(415976)",
"score": 0.7988075017929077
},
{
"type": "OTHER",
"entities": "065-61234567",
"score": 0.9987536668777466
},
{
"type": "QUANTITY",
"entities": "$100.50",
"score": 0.9754421123651128
},
{
"type": "PERSON",
"entities": "JOHN DOE",
"score": 0.997162401676178
},
]
}

6. Eligibility Triaging Using LLMs
To assess eligibility, we incorporated the concept of "few-shot learning" into our prompts with a temperature setting of 0.

This enables the model to generate deterministic and reliable output based on the extracted information.

{
"Prompt": """
Based on the following {text} and {criteria},
tell me if this individual is eligible for the scheme and why.

### EXAMPLES
text: $9
criteria: Total Amount < $10
result: yes

text: ABC Company
criteria: Company Name != ABC Company
result: no

### RESPONSE
text: {text}
criteria: {criteria}
result:
"""
}
{
"Query_Result": "$100.50",
"Response": "Yes, the user is eligible.\n
The reason is that their total amount is less than $150,\n
which meets the eligibility criteria.\n
Additionally, the user has a specific amount of $100.50,\n
which is also within the eligible range.\n
Therefore, the user meets the requirements for eligibility."
}

7. API Response
Finally, the application returns the API response, which includes the eligibility result and any other relevant details requested by the agency.

This response can then be integrated seamlessly into agency front-end systems and workflows for further processing or automated decision-making.

Our very first prototype

So far, so good.

But we still needed to address the elephant in the room.

How do we make sure that the user’s documents are secure?

Data Security

As our entire setup is hosted on Government Commercial Cloud (GCC), we were able to implement robust safeguards and access control measures.

Document Upload and Storage
The documents are transmitted through a secure form uploader using encryption protocols to protect data during transmission.

They are then securely stored in Amazon S3, which offers data encryption options and access controls through IAM (Identity and Access Management) to safeguard them from unauthorised access.

AWS Services
By using AWS services such as Amazon Textract and Amazon Comprehend, we combine the benefits of their advanced features with the added layer of security that GCC provides.

Compliance and Auditing
Our application adheres to ICT&SS best practises and compliance regulations.

Logging and auditing mechanisms are in place to monitor access, changes, and activities within the system.

Future Enhancements and Scalability

One of our immediate plans involves collaborating with the Data Science and Artificial Intelligence Division (DSAID).

This collaboration aims to evaluate the performance and robustness of different LLMs under challenging scenarios.

The objective is to ensure that these models can be relied upon when deployed in real-world scenarios.

We are also actively working on making our solution more accessible to government agencies by allowing them to integrate it into their existing systems through an API.

In addition, we are constantly working to improve the accuracy and reliability of our document parser to further enhance its effectiveness.

Exciting Times

--

--

Terence Lucas Yap
Government Digital Services, Singapore

Sleeping comes so naturally to me, I could do it with my eyes closed.