As you might have read in my previous post I like Flutter. There’s a lot to read on why it could also be cool for you to try out but this post covers something different.
As a matter of fact I not only like Flutter but I also like Machine Learning. The thought of a machine that is learning how to perform a task has a huge fascination on me. So of course the natural process was to combine these two topics. And what better way to start off the process than Google’s very own MLKit.
It was introduced at Google I/O in 2018. The basic idea is to open up access to their Machine Learning magics and make them available to mobile developers. There is a lot of documentation for Android and iOS but I guess they’re still perfecting the Flutter versions to meet the high standards of the docs on the Flutter website.
So my goal is to introduce you to the different functionalities offered by MLKit one blogpost at a time. I want to give you a quick overview on the functionality of the API and also provide a little sample app to show you how you can use it in a real-world example. This post contains the most important code snippets that were used and the full app can be found on GitHub. Let’s dive into it.
The first topic I want to introduce to you is text recognition. This generally means that you have an image with some text in it. You want to get the text out of it because you want to extract information. If you’re not a computer vision expert or like to spice up things with a little Convolutional Neural Network (CNN) for your own deep-learning pipeline (hopefully the topic of a future blog post) you might think you’re lost now, right?
Well to your great surprise this is wrong. MLKit does this for you. The structure of their API is pretty simple. You hand it an image and it gives you the text. All the black magic that Google is applying to this image to extract the information is hidden. As a mobile developer you might not want to know about the witchery behind this and this is perfectly fine. If you are concerned about privacy of data (which you should be) it might be important to mention that the system can be configured to only run locally on the device.
This is where our example app (with that short, catchy name) will come in. We want to be able to take a photo of a business card and directly get the email out of it. Why, you might ask? Because we are lazy and why should we type a few letters if we can build a full-blown app for that!? So we simply snap a photo and then we want an output that will look like this:
So we create our new app in our favorite IDE (which we perfectly prepared for Flutter development). Now we have to do 4 things:
- take a photo
- use MLKit to extract the text from the photo
- search for the email in the text
1. Take a photo
We can use a wonderful little Flutter package called “camera” to get access to the camera (it is important to follow the steps described in the “Installation” part of the docs of the package). First we need to add the package in our pubspec.yaml file. Then we can integrate this into our starter app in order to use it to take a photo and save it to disk.
This (simplified) gist does all of that. We first need to import the package in our dart file with
import 'package:camera/camera.dart. Then we need to get the available cameras with
await availableCameras()(for simplicity we omit the check if we actually have cameras, but note that this might throw an error). The
CameraController controller object handles the interaction with the camera itself and can be easily created with a single camera object and information about the resolution. Then we simply define a filePath where we want to save the image to (Note: in the real code we use the pathProvider package to get the directory for the documents of your application). Finally we ask the controller take a picture for us and save it to the specified filePath with
If you do not know what the
await keyword means you can read more about asynchronous programming in Dart here. In short: the commands
takePicture() return a
Future object. This sounds fancy but simply says that the execution of the function might take some time and will only return in some time in the future (hence the name). If we want to free up the CPU to do other computations while waiting for an operation to finish we can use the
await keyword (Note: you also have to mark the function containing the code
2. Use MLKit to extract text from the image
Another task, another package. There is one from the Flutter team themselves, that provides access to MLKit called firebase_ml_vision. Again, you can check out the link on how to configure it (especially that it runs locally).
In addition to the package you need to set up a Firebase project and configure each platform for it. You can follow the steps in this codelab. Especially steps 6 and 7 are important. After performing this you are able to use MLKit inside your Flutter app for both Android and iOS.
So let’s go through the different steps. First of all we need to get the image and transform it so that Firebase can make use of it.
First we need to create an imageFile based on the filePath we previously set up to save the taken image. Then we need to use our new package to transform our file into a
FirebaseVisionImage called visionImage so that it can be processed by the MLKit platform. We use the handy factory method to get it directly from our imageFile
FirebaseVisionImage is simply a consistent representation for Firebase to work with the image. It contains the raw image bytes along with some metadata information like rotation and size.
We now need to create a
TextRecognizer from the vision package and simply hand the created visionImage to it. Notice that this operation returns a
Future again so we need to use the
await keyword once more. This is basically it. Here the magic happens and the MLKit platform will analyze the image and extract all the valuable information from it.
3. Search for mail address
Lastly we need to extract the address from all the text we received from MLKit. One cool thing of these packages is that you can access the source code of them (at least for all packages I have ever used). So we can look at the code of the
VisionText type here. We see that it consists of a list of
TextBlock objects and these contain lists of
TextLine objects. The following image shows the structure of a
Perfect! With that we can finalize our heroic quest of mail address search with the following code snippet:
So we use a regular expression to find the pattern of a mail address. While this pattern seems complex, of course it is very easy to come up with it from scratch and you don’t have to google it everytime you use it. We now simply iterate through all
TextBlocks and all
TextLines and if we find a mail address we simply assign it to our previously defined
We did it!
That’s it! We have done everything we wanted to do. You can check the Github Repo how to do the other stuff like visualization. But this is an example how you can use MLKit and their text recognition capabilities in your app.
Of course there are a ton of improvements where you can go from here. A few that directly come to my mind:
- With a VisionText and their TextBlocks you also get exact location of the text in the image so you could play around with directly showing the text inside of the image.
- You could go further than simply detecting mail addresses and could directly add those to your contacts or send a “Nice to meet you” message to them.
- You could use the live-feed of the camera to consistently search for mail addresses without the need to snap a photo.
In order to keep this tutorial as simple as possible I didn’t go any of these routes but if any of you is experimenting with that please reach out to me and share this. I’d be happy to read about it.
I hope you learnt how to take your first steps with MLKit in Flutter and have seen how easy it makes it for you to incorporate quite sophisticated machine learning procedures in your apps. Looking forward to the next tutorial and if you want to read about anything specific, just write it in the comments.