Adding Computer Vision to your iOS App

Recently I’ve been using the Google Cloud Machine Learning APIs with Node.js and Python, but I wondered — wouldn’t it be cool if there was an easy way to add them to a mobile app? That’s where the magic of Firebase comes in. I built an iOS app in Swift that makes use of the Cloud Vision API via the Firebase SDK for Cloud Functions. Here’s how it works:

The iOS client uploads an image to Cloud Storage for Firebase. This triggers a Cloud Function, where I’ve written Node.js code to send the image to the Vision API’s safe search and web detection methods. When I get the Vision API response, I write the response JSON into a document in Cloud Firestore (the latest Firebase DB offering). The iOS client has a listener on the Firestore document where my function will write the image data so it can display results in the UI. Here’s what the app looks like on an iPhone X (!) in the simulator (ignore the janky UI, I’m not an iOS design expert):

Let’s dive into the code! Before I get started I’ll create a new project in the Firebase console and initialize Storage, Functions, and Firestore for the project using the Firebase CLI:

Update: @joaolaq built an Android version of this app after I published this post. Check it out on GitHub!

Step 1: Upload an image from the device to Cloud Storage for Firebase

The first step is to upload an image from the iPhone’s photo library to Cloud Storage. I connected the upload button in my Storyboard to an IBAction:

In my UIImagePickerController delegate method I can upload the image to Cloud Storage with just a few lines of code:

Step 2: Send the image to the Vision API with Cloud Functions for Firebase

The Firebase SDK for Cloud Functions lets us respond to different events in our Firebase environment, in this case a file being uploaded to Cloud Storage. Anytime our iOS client uploads a new photo, we’ll send that photo to the Vision API and then process the response.

The Vision API gives us access to a pre-trained model for image recognition in a single API call. It has many features (read about them all here), but in this example I’ll use safe search and web detection. Why these two? Many mobile apps use images in some way or another, and rather than having someone manually review whether these images are appropriate we can automate this with an API call to keep our app SFW. Web detection is one of my favorite Vision API features because you can do cool things like reverse image search to find similar and matching images.

Cloud Functions are written in Node.js. Here’s what our Vision API call looks like:

It’s worth noting that with one REST API request we’re able to get tons of data on our images, whereas we’d otherwise need to build a custom model from scratch and give it enough training data to be able to flag images as inappropriate and find visually similar images from across the web. In addition to all of that, we’d also need to handle serving our model and making prediction requests.

Here’s what the response JSON looks like for safeSearchAnnotation from the cat picture used in my demo above:

If you look at the gif of my demo you’ll notice that when I sent it a selfie I took in Cambridge UK, it returned pictures of other people’s selfies from the same location. Pretty cool! Here’s part of the webDetection JSON response:

Step 3: Write our Vision API response to Cloud Firestore

Now that we’ve got our Vision API response, we need a way to connect it to our iOS app. Cloud Firestore is a great option for this. Within our function, we’ll use the firebase-admin npm package to store the response JSON as a document in Firestore. The response for each image will be stored as a document under our images collection with a key corresponding to the the filename of our image (this code will replace the comment in our Cloud Function above):

And here’s what the data for one image looks like in the Firestore console:

Now we’ve got our image data from the Vision API stored in a database!

Step 4: Listen for Firestore updates and display Vision API data in our iOS app

Going back to the image upload function from step 1, I’ll add a listener to the Firestore location for my image:

And then display the web entities and similar images from the Vision API in the app:

That’s it! With a bit of Swift and Node.js I’ve got an iPhone app that makes use of a pre-trained model for image recognition.

What’s next?

This app focused on the Firebase iOS SDK, but you could easily do the same thing with Android. Stay tuned for future posts exploring ML from a mobile development perspective, and I’d love to hear what you think: add comments below or find me on Twitter @SRobTweets if you’ve got ideas for future posts.