Cloud Vision vs Flutter MLKit for OCR detection of Concept2 machine
This is a comparison of two OCR detection alternatives for the contents of the Concept 2 rowing machine LCD display. MLKit in Flutter vs Cloud Vision at Google Cloud.
Background
Dreamwod is a Crossfit app where a lot of athletes are using the Concept2 machines for biking, rowing, and skiing. These machines have a memory where athletes can get statistics and split times over a workout. We have developed a feature where the athletes can take a picture of the screen (see below) and the data/text is recognized and imported directly into Dreamwod.
Below is an example screen of a 6x2000m interval rowing session with 4 minutes of rest between each interval. /500m is the average time per 500m and s/m means strokes per minute.
The source code with examples and descriptions are available here https://github.com/dreamwod-app/mlkit-vs-cloud-vision.
The two alternatives, MLKit vs Google Cloud Vision
We have looked at two different alternatives when performing the OCR. We looked at Google MLKit on the client-side and Cloud Vision API on the backend side, both alternatives have their drawbacks and advantages.
Flutter MLKit
The first alternative is to use Flutter MLKit and do the OCR scanning directly in the app and then send the extracted text to the backend. The advantage of this is obvious that the load is on the app side and we don’t have to do anything on the backend “more” than understanding the scanned content. The main drawback, which we will see later, is that the quality of the OCR is poor.
Pros
+ Client/app side processing
+ Free (no additional cost)
+ Quick (~1 second to get the result)
Cons
- Bad quality on the OCR
- Does not work on all devices
Google Cloud Vision API
The second alternative is to use Cloud Vision API, upload the image and do the processing on the backend side, for example with Cloud Run or a Cloud Function.
Pros
+ Good performance
+ Works on older devices
Cons
- Increased backend load
- Additional cost (*)
(*) The first 1000 calls every month to the Vision API are free and then the price is $1.5 per 1000 requests.
Implementations
Cloud Vision API
Full implementation of the example program is available at https://github.com/dreamwod-app/mlkit-vs-cloud-vision. Check out the code, compile and run it with one of the example images below.
$ git clone git@github.com:dreamwod-app/mlkit-vs-cloud-vision.git
$ go build
$ ./mlkit-vs-cloud-vision vision -image images/example_2.jpg -out output/example_vision_2.jpg
The program will read the example image and call the Cloud Vision API. The program will then use the response and draw rectangles around the red rectangles where the text was found.
The examples in the Github repository contain four different images with different quality and resolutions. It’s possible to use any image, just replace the -image and the -out arguments.
$ ./mlkit-vs-cloud-vision vision -image any-image.jpg -out out.jpg
MLKit with Flutter
There are some different flutter packages out there for ML but the most up-to-date is google_ml_kit which is using the models from the Google MLKit at https://developers.google.com/ml-kit.
The example below will output coordinates of the detected areas and a file with these can be provided to the golang library and to draw a similar image as for Cloud Vision.
$ ./mlkit-vs-cloud-vision draw -image images/example_1.jpg -out output/example_mlkit_1.jpg -coords coords/coords1.txt
Results
It looks like the examples above are pretty similar but it turns out that there is a huge difference between the Cloud Vision API and MLKit. One of the four images couldn’t be used at all with MLKit and the quality of the extracted text is poor.
- One of the four images couldn’t be used with MLKit, no text was detected.
- MLKit failed to detect a lot of numbers. It especially has trouble seeing the difference between “3” and “B”, 3.41.8 is seen as B.41.8.
- MLKit see number very differently, 2:03.5 are for example returned as 2:03.5, 2:035, 2:03,5, 203.5 etc.
- The only real issue for Cloud Vision was that it couldn’t detect “s” and “/” correctly when the s was superscripted in “s/m”.
The table below is a summary of the result, Cloud Vision detects almost everything, and MLKit has quite some texts that aren’t recognized correctly.
Complete results are available at https://github.com/dreamwod-app/mlkit-vs-cloud-vision#results.
Conclusion
Doing machine learning on the client/app side is cool but the quality of the recognized texts is in our use case not good enough. There could be other scenarios where it is working better and with higher quality. We are because of that using Cloud Vision API for the upcoming feature of importing the Concept 2 results directly into Dreamwod.
Hope it helps when choosing between Cloud Vision and MLKit for text recognition! 🚀