Tutorial: OCR with PaddleOCR (PP-OCR)
In this post, I will test PP-OCR system for the optical character recognition system.
PP-OCR is a practical ultra-lightweight OCR system and can be easily deployed on edge devices such as cameras, and mobiles,…I wrote reviews about the algorithms and strategies used in the model. You can read here:
- Part I: Review overall architecture and text detector on paper
- Part II: Review the direction classification model and text recognitor
1. Installation
Some configuration information on my computer:
- OS: Windows 10 Pro 64-bit
- CPU: Intel(R) Core(TM) i5–9500 CPU @ 3.00GHz 3.00 GHz
- RAM: 64 GB
- GPU: NVIDIA GeForce RTX 2060 6GB
- Python Environment: Python 3.8.5
Firstly, install the official code from GitHub:
git clone https://github.com/PaddlePaddle/PaddleOCR.git
Next, I install PaddlePaddle. If you have CUDA 9 or CUDA 10 installed on your machine, run:
python3 -m pip install paddlepaddle-gpu
If you have only CPU, run:
python3 -m pip install paddlepaddle
Then, I install the pretrained model. You can find it here:
There are many trained models of different sizes.
If you want to use multiple languages (Korean, Japanese, …), you can download them from here:
2. Inference
For example, I use the English ultra-lightweight PP-OCRv3 model for inference. I download inference models for detection, direction classification, and recognition and save them to /inference/det, /inference/cls/, /inference/reg respectively and extract them.
After downloading, folder ./tools/infer
contains files for prediction.
Run file predict_system.py
for OCR. Run:
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/254.jpg" --det_model_dir="./inference/det/en_PP-OCRv3_det_infer/" --cls_model_dir="./inference/cls/ch_ppocr_mobile_v2.0_cls_infer/" --rec_model_dir="./inference/reg/en_PP-OCRv3_rec_infer/" --rec_char_dict_path="./ppocr/utils/en_dict.txt"
The parameter image_dir
specifies image path, the parameter det_model_dir
specifies the path to detect the inference model, and the parameter cls_model_dir
specifies the path to angle classification inference model and the parameter rec_model_dir
specifies the path to identify the inference model, the rec_char_dict_path
specifies Engish dictionary path. The visualized recognition results are saved to the ./inference_results
folder by default. There are many parameters for you to adjust. You can see the details in the file utility.py
You can also run each model separately in files predict_det.py
, predict_cls.py
, predict_rec.py
3. Training
For training, You can see details in the authors’s source:
- Training text detection:
- Traing Text Direction Classification:
- Training text recoginition:
Conclusion
In this post, I wrote about how to set up, train, and test the PPOCR model. You can find the official source code at:
If you have any questions or want me to check out the other open sources, please comment below or contact me via linkedin or github
If you enjoyed this, please consider supporting me.