Day 63 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
3 min readAug 28, 2020

Flipkart Grid 2.0. One of the hackathons which changed my approach towards a lot of solutions and problems. So about a month ago, me and a friend registered for this hackathon casually but as we started diving deep to dig for a solution, the problem got more interesting and complex to work with. Thought of sharing my approach in a blog since it was almost 1.5 months of work.

So the problem statement given to us was Electronic Invoice Processing. The initial approach was to make use of a frontend for the user to upload the receipts which would be stored in a database and then use OpenCV in order to preprocess all the invoice images which are being uploaded and then process them using an OCR technique such as Tesseract and extract the text from it.

Now, there are several different approaches that one can have towards this solution and the main disadvantage that we faced was in the accuracy of the OCR and the text obtained. So the next important technique was to change the approach. This is where we started using an RCNN technique in order to identify the text using a CNN model and coupling it with a RNN idea in order to identify the contents of the invoice which are related to each other.

The model proved quite successful but this is where the drawback came in. We needed to focus on a user-driven application and it got a bit hard in order to integrate the model along with the application and maintain a database at the same time. Here comes the next approach that we had tried out.

The last approach which we tried out and which turned out to be quite useful was with AWS. AWS is something that I have always been comfortable using with and it didn't require me to learn much apart from the debugging on lambda functions. The next section is going to focus on the solution mainly but you can find it all on my github.

The github link to the solution is given below and the instructions are all mentioned in the readme.

Essentially we have created an API solution for FLIPKART which can be used in order to extract all the data from an invoice in a very efficient manner and it has been configured by using a DATABASE (DynamoDB) in order to maintain integrity of information and at the same time provide efficient and accurate results. The above solution is coupled along with a simple frontend and a python notebook which will obtain all the results and convert them into the required excel file which can then be provided to the customer. In this manner, essentially all the data which was present in the invoice is converted into an excel sheet. The solution is highly reliable because it makes use of AWS and it has been configured in a manner that it provides accurate and efficient result to the user. It provides the user with a frontend which is a major plus point as the user does not have to go through the hassle of opening the AWS console but can simply run a few CLI based commands. It is straightforward to invoke this API from AWS CLI or using Boto3 Python library and pass either a pointer to the document image stored in S3 or the raw image bytes to obtain results. In this Invoice processor solution, following approaches are used to provide for a more robust end to end solution.

  • Lambda functions triggered by document upload to specific S3 bucket to submit document analysis and text detection jobs to Textract
  • API Gateway methods to trigger Textract job submission on-demand
  • Asynchronous API calls to start Document analysis and Text detection, with unique request token to prevent duplicate submissions
  • Use of SNS topics to get notified on completion of Textract jobs
  • Automatically triggered post processing Lambda functions to extract actual tables, forms and lines of text, stored in S3 for future querying
  • Job status and metadata tracked in DynamoDB table, allowing for troubleshooting and easy querying of results
  • API Gateway methods to retrieve results anytime without having to use Textract

The steps to run the project are mentioned in my github repo. Thanks for reading. Keep Learning.

Cheers.

--

--