Let’s extract information from a damaged image using the AWS Textract and Azure Form Recognizer OCR APIs — Javascript.
Two months ago, I got a task where I had to extract information from an old, damaged picture that contained data formats like forms, tables, and handwritten. Then I started exploring Optical Character Recognition (OCR) and found the two popular modules provided by AWS and Azure, which are “Aws Textract” and “Azure Form Recognizer AI”.
AWS Textact: Automatically extract printed text, handwriting, layout elements, and data from any document. Learn more
Azure Form Recognizer: cloud service that uses machine learning to analyse text and structured data from your documents. Learn more
Let’s start with how to actually implement them using Javascript and npm packages.
> npm init -y // to intialize npm
> npm install @aws-sdk/client-textract // to install aws textract client
> npm install @azure/ai-form-recognizer // to install azure recognizer client
AWS Textract implementation:
import { TextractClient, AnalyzeDocumentCommand } from "@aws-sdk/client-textract";
import fs from "fs";
const awsTextractClient = new TextractClient({
region: process.env.AWSREGION, // your aws region
credentials: {
accessKeyId: process.env.AWSACCESSKEY, // your aws access key
secretAccessKey: process.env.AWSSECRETKEY // your aws secret key
}
});
const performAwsOcr = async () => {
try {
const imageBuffer = fs.readFileSync("./sample.png")
const params = {
Document: {
Bytes: Buffer.from(imageBuffer, 'base64'),
},
FeatureTypes: ["TABLES"], // https://aws.amazon.com/textract/features/
};
const ocrData = await awsTextractClient.send(new AnalyzeDocumentCommand(params));
return ocrData
} catch (error) {
return console.log(error.message);
}
}
const extractedData = performAwsOcr();
console.log(extractedData);
Result:
As the actual result doesn’t look human-readable, I have given the Excel version of the extracted data from the sample image.
Pros and cons of AWS Textract:
Pros:
- The confidence rate is very high and mostly falls under <80% to 97%.
- Relationships have been well established among the data, which makes extraction easier.
Cons:
- Multiple formats of data detection are less accurate; mostly everything is considered a table.
Azure Form Recognizer implementation:
import { DocumentAnalysisClient, AzureKeyCredential } from "@azure/ai-form-recognizer";
const azureRecognizerClient = new DocumentAnalysisClient(
process.env.AZUREENDPOINT, // your azure endpoint
new AzureKeyCredential(process.env.AZURESECRETKEY) // your azure secret key
);
const performAzureOcr = async () => {
try {
const imageBuffer = fs.readFileSync("./sample.png")
const params = await azureRecognizerClient.beginAnalyzeDocument("prebuilt-document", imageBuffer);
const ocrData = await params.pollUntilDone();
return ocrData
} catch (error) {
return console.log(error.message);
}
}
const extractedData = performAzureOcr();
console.log(extractedData);
Result:
Pros and cons of Azure Form Recognizer AI:
Pros:
- Pre-defined models are available, which will almost cover all kinds of document formats.
- Produces multiple formats of data detection rather than returning only tables.
Cons:
- Relationships are established weekly, which may lead to false arrangements.
Conclusion:
As I tested these modules on numerous images, their prediction rates were quite close, but the main principle of which one to choose is entirely dependent on your needs.
Note: We concluded using AWS Textract as our requested image mostly contains image feature type data. 🎉🎉🎉…
You can find the entire implementation API in my GitHub. Link ✨