Exploring the Cloud Vision API

What is the Vision API?

The Cloud Vision API gives you contextual data on your images by leveraging Google’s vast network of machine learning expertise with a single API request. It uses a pre-trained model trained on a large dataset of images, similar to the models used to power Google Photos.

Vision API browser demo: cloud.google.com/vision

What can the Vision API tell us about an image?

Lots of things! The Vision API provides a list of annotations in its JSON response that tell you the entities, landmarks, faces, text, and logos found in your image. To give you a sense of its capabilities, I’ll highlight a few of these features here.

Identify landmarks

We’ll continue with our picture of the 25 de Abril Bridge above to examine how to use the Vision API to identify landmarks in an image. Here’s what the landmarkAnnotations endpoint returns for that photo:

"landmarkAnnotations": [
{
"mid": "/m/04x4w7",
"description": "25 de Abril Bridge",
"score": 0.87690926,
"boundingPoly": {
"vertices": [
{
"x": 87,
"y": 138
},
...
]
},
"locations": [
{
"latLng": {
"latitude": 38.693791,
"longitude": -9.177360534667969
}
}
]
}
]

Search the web for more data on your image

My favorite Vision API feature is webAnnotations — it uses Google Image Search to find entities in your photo, along with URLs to matching and similar photos from across the web. Here’s what a web entity response looks like for our bridge picture:

"webEntities": [
{
"entityId": "/m/04x4w7",
"description": "25 de Abril Bridge"
},
{
"entityId": "/m/0gnqtl",
"description": "Christ the King"
},
{
"entityId": "/m/02snjn",
"description": "University of Lisbon"
},
...
]
"fullMatchingImages": [
{
"url": "http://travelanddance.be/onewebmedia/55%20lisbon.jpg"
},
...
],
"visuallySimilarImages": [
{
"url": "http://2.bp.blogspot.com/-3QFcsa0kJFE/TjLxF5MgbHI/AAAAAAAADZE/mp6gmJbmZDo/s400/puente+25.jpg"
},
...
],
"pagesWithMatchingImages": [
{
"url": "https://www.youtube.com/watch?v=OJirc431z2Y"
},
...
]

Identify text in images (OCR)

Another common image analysis task is finding text. Let’s say you have this picture of a street sign from Paris:

"textAnnotations": [
{
"locale": "fr",
"description": "7Arr!\nAVENUE RI\nDE TOURVILLE\n1642 - 1701\nAMIRAL ET MARECHAL DE FRANCE\n",
"boundingPoly": {
"vertices": [
{
"x": 850,
"y": 637
},
...
]
}
},
"textAnnotations": [
{
"locale": "en",
"description": "Sara Robinson\nDeveloper Advocate\n@SRobTweets\nGoogle\nSararob@google.com\n111 8th Avenue\nNew York, NY 10011\n",
"boundingPoly": {
"vertices": [
{
"x": 126,
"y": 235
},
...
]
}
},
...
]
  • Detect inappropriate images (more on that here)
  • Detect popular logos
  • Find the dominant colors in an image and get suggested crop dimensions

Calling the Vision API in Node.js

Next let’s take a look at how you can use and call the Cloud Vision API. I built a demo for my talk at I/O that takes an image, sends it to the Vision API, and displays the entities and face detection responses in a UI:

Step 0: set up your local environment

To run this sample you’ll need Node.js and npm, which you can install by following the instructions here. Then install the Firebase CLI:

npm install -g firebase-tools

Step 1: add and install dependencies

cd into your functions/ directory, and in the dependencies block of your package.json file add the google-cloud Vision API module for Node: "@google-cloud/vision": "^0.11.2".

const vision = require('@google-cloud/vision')({
projectId: 'your-project-id',
keyfileName: 'keyfile.json'
});
const admin = require('firebase-admin');
const functions = require('firebase-functions');
// Create the Firebase reference to store our image data
const db = admin.database();
const imageRef = db.ref('images');

Step 2: writing the function

To trigger this function on a Google Cloud Storage event, we’ll use functions.storage.object() — this tells Firebase to listen for object changes on the default storage bucket for our project. Inside the function we’ll call the Vision API, passing it the Cloud Storage URL of our image and the types of feature detection we want to run, storing the JSON response in our Firebase Database:

exports.callVision = functions.storage.object().onChange(event => {  const obj = event.data;
const gcsUrl = "gs://" + obj.bucket + "/" + obj.name;
return Promise.resolve()
.then(() => {
let visionReq = {
"image": {
"source": {
"imageUri": gcsUrl
}
},
"features": [
{
"type": "FACE_DETECTION"
},
// Other detection types here...
]
}
return vision.annotate(visionReq);
})
.then(([visionData]) => {
console.log('got vision data: ', visionData[0]);
imageRef.push(visionData[0]);
return detectEntities(visionData[0]);
})
.then(() => {
console.log(`Parsed vision annotation and wrote to Firebase`);
});
});

Step 3: deploying the function

For brevity I haven’t included the full functions code here but you can find it on GitHub. Once you’re done writing your function and you’re ready to deploy it, run firebase deploy from the root directory of your project.

Who is using the Vision API?

Classifying cat photos and differentiating landmarks is entertaining, but is anyone actually using this in production? Yes! Here are two examples:

  • Realtor.com: using the Vision API’s OCR to extract text from images of For Sale signs and bring users more info on the property
  • Disney: used label detection in a scavenger hunt game to promote the recent movie Pete’s Dragon

When should I not use the Vision API?

The Vision API gives you access to a pre-trained image analysis model with a single API call, which makes it easy to add ML functionality to your apps without having to focus on building or training a custom model. However, there are some situations where you would want to train a custom model — let’s say you wanted to classify medical images as a specific condition or label images of art as a particular style or time period. That’s where you’d want to use a framework like TensorFlow.

Get started

If you’ve tried the Vision API and have any feedback, let me know what you think on Twitter @SRobTweets. And here are some handy links to everything I’ve discussed in this post:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sara Robinson

Sara Robinson

3.8K Followers

Connoisseur of code, country music, and homemade ice cream. Helping developers build awesome apps @googlecloud. Opinions = my own, not that of my company.