Implementing an OCR for Identity Cards — Part 2: Fine-Tuning
--
In the first part, Implementing an OCR for Identity Cards — Part 1: Image Preprocessing, I covered the various preprocessing steps I took to help out the OCR as much as possible. This is because most OCR systems are very sensitive to the input images, especially if the images are tilted, noisy, etc.
In this part, we’ll cover the actual OCR-ing, along with how I did model fine-tuning for the KTP OCR mode, using the NIK and Name as examples. Information on fine-tuning a model, much less an OCR model, is quite sparse. The information presented here has been gleaned from scouring numerous GitHub issues and lots of trial and error.
Now, it is quite difficult to give a step-by-step account as to how to actually do this, because it’s a very iterative process. But hopefully, you’ll get a general idea and adapt it to your own use case.
EasyOCR
After evaluating a bunch of open-source OCR solutions, I settled on EasyOCR because it gave the best results balanced with inference time. More importantly, I was able to figure out how to fine-tune it. Most importantly, having 15+ GitHub stars didn’t hurt either.
EasyOCR on NIK
Now, while EasyOCR did pretty OK for most of the text I threw at it, detection of the NIK had to be as accurate as possible. Relying just on EasyOCR’s accuracy along with other sensible text-preprocessing, I achieved an accuracy of 78%. Unfortunately, this is a far cry from our previous (paid) implementation which achieves around 98% accuracy (on a curated dataset, but still :P).
Overview of Model Fine-Tuning
So here’s the attack plan:
- Design synthetic data that mimics what a NIK would look like.
- Create a Train and Validation dataset out of the synthetic data.
- Create a Test dataset out of annotated data.
- Training the model with the synthetic data.
- Use the model weight for model inference.
- Evaluate model accuracy. Go back to 1 if necessary.
Also, I trained this on a single NVIDIA GTX 1060, so nothing really fancy. But note that the…