Let’s extract information from a damaged image using the AWS Textract and Azure Form Recognizer OCR APIs — Javascript.

Raveendhar S
3 min readMar 30, 2024

--

Two months ago, I got a task where I had to extract information from an old, damaged picture that contained data formats like forms, tables, and handwritten. Then I started exploring Optical Character Recognition (OCR) and found the two popular modules provided by AWS and Azure, which are “Aws Textract” and “Azure Form Recognizer AI”.

AWS Textact: Automatically extract printed text, handwriting, layout elements, and data from any document. Learn more

Azure Form Recognizer: cloud service that uses machine learning to analyse text and structured data from your documents. Learn more

Let’s start with how to actually implement them using Javascript and npm packages.

> npm init -y // to intialize npm 
> npm install @aws-sdk/client-textract // to install aws textract client
> npm install @azure/ai-form-recognizer // to install azure recognizer client
We’ll use this unclear image to perform our OCR.

AWS Textract implementation:

import { TextractClient, AnalyzeDocumentCommand } from "@aws-sdk/client-textract";
import fs from "fs";

const awsTextractClient = new TextractClient({
region: process.env.AWSREGION, // your aws region
credentials: {
accessKeyId: process.env.AWSACCESSKEY, // your aws access key
secretAccessKey: process.env.AWSSECRETKEY // your aws secret key
}
});

const performAwsOcr = async () => {
try {
const imageBuffer = fs.readFileSync("./sample.png")
const params = {
Document: {
Bytes: Buffer.from(imageBuffer, 'base64'),
},
FeatureTypes: ["TABLES"], // https://aws.amazon.com/textract/features/
};
const ocrData = await awsTextractClient.send(new AnalyzeDocumentCommand(params));
return ocrData
} catch (error) {
return console.log(error.message);
}
}

const extractedData = performAwsOcr();
console.log(extractedData);

Result:

As the actual result doesn’t look human-readable, I have given the Excel version of the extracted data from the sample image.

Sheet 1 extracted from the image (AWS)
Sheet 2 extracted from the image (AWS)

Pros and cons of AWS Textract:

Pros:

  • The confidence rate is very high and mostly falls under <80% to 97%.
  • Relationships have been well established among the data, which makes extraction easier.

Cons:

  • Multiple formats of data detection are less accurate; mostly everything is considered a table.

Azure Form Recognizer implementation:

import { DocumentAnalysisClient, AzureKeyCredential } from "@azure/ai-form-recognizer";

const azureRecognizerClient = new DocumentAnalysisClient(
process.env.AZUREENDPOINT, // your azure endpoint
new AzureKeyCredential(process.env.AZURESECRETKEY) // your azure secret key
);

const performAzureOcr = async () => {
try {
const imageBuffer = fs.readFileSync("./sample.png")
const params = await azureRecognizerClient.beginAnalyzeDocument("prebuilt-document", imageBuffer);
const ocrData = await params.pollUntilDone();
return ocrData
} catch (error) {
return console.log(error.message);
}
}
const extractedData = performAzureOcr();
console.log(extractedData);

Result:

Sheet 1 extracted from the image (Azure)
Sheet 2 extracted from the image (Azure)

Pros and cons of Azure Form Recognizer AI:

Pros:

  • Pre-defined models are available, which will almost cover all kinds of document formats.
  • Produces multiple formats of data detection rather than returning only tables.

Cons:

  • Relationships are established weekly, which may lead to false arrangements.

Conclusion:

As I tested these modules on numerous images, their prediction rates were quite close, but the main principle of which one to choose is entirely dependent on your needs.

Note: We concluded using AWS Textract as our requested image mostly contains image feature type data. 🎉🎉🎉…

You can find the entire implementation API in my GitHub. Link

--

--