How I See Things With Google Lens

Published in

0xMachina

5 min readOct 28, 2019

AI-enabled vision for identifying objects and items in an image. (Source Google)

For Android users, Google Assistant provides a feature that uses image recognition technology with the smartphone camera. It is called Google Lens and this is an app that can help users identify items, landmarks, products and objects. What it is really good for is visually analyzing something you want to learn more about. It is information straight from what the camera sees.

The Google Lens user interface from an Android smartphone.

Integrating this feature with Google Assistant shows the advancement when it comes to AI. Google has a strong background in this field and Google Lens is just one of many examples. Its application to computer vision is all the more beneficial. Instead of doing a search on an item one sees at the grocery store, you can simply point their smartphone camera at it and allow Google Lens to provide more information. It does it in real time with results. The only limitation here is the speed of data access over the network. For most users a 4G connection is sufficient enough for that.

I found it quite useful for identifying products I don’t know. In this example I can just point the camera to a photo and using the search icon, Google will pull up any information it has about the image.

Identifying products and providing more information about it via Google’s search engine.

Label identification and feature extraction is something Google Lens has been developed to do. This allows it to find objects in an image, identify them and provide information about it. This makes use of plenty of machine learning in order to increase accuracy, so it is very much data driven.

Multiple objects can be identified as seen from this image capture from Google Lens. The labels have been “highlighted” and this is used to get more information about the object.

When a label is identified (e.g. ‘Pale Pilsen’ beer), the next step is to extract the feature. In this case ‘Pale Pilsen’ is a type of beer. Google Assistant will then take the captured image with the identifications to Google Search to gather more information about the products and items (if any were identified). Google has trained datasets that contain images that they have identified properly that Google Lens uses to provide users with information.

This is great for traveling to different places. A tourist can just point their smartphone camera to identify a building or landmark.

Identifying landmarks like buildings and public places.

When walking into a store, users can get more information about their favorite food items that includes other places where you can buy them and reviews from other users.

I got the information about these snacks including stores that sell them and reviews made by other people.

You can also learn more about what you want to eat in a new restaurant. This can give essential details about the food, how others reviewed it and even nutritional information.

It is good to learn more about what to eat before ordering.

It is not just objects and places that Google Lens can identify, but also text using OCR (Optical Character Recognition). This allows users to extract the text from an image and copy it to the clipboard. From there, users can take the text and add it to a message or document.

What I find useful about this feature is that it can also allow you to translate the text from one language to another. For example, if you were in a foreign country and you need to know what a sign means, you can simply use Google Lens to capture the image and OCR will extract the text. You then have the option to translate it into a language you understand. There are two ways to do this. You can simply do OCR first and then you have the option to translate it later. The other more direct way is to press the translate button. It also works with Unicode characters in different languages like Chinese (see example below).

Text is first identified in the image. The text can then selected for translation if it is not in the native Language of the user. By selecting the ‘Translate’ option, the user gets the translation (in this example) from Spanish to English.

Different characters other than standard European alphanumeric can be identified and translated. In this example using the more direct translate feature, the characters were successfully translated from Chinese to English.

Although Google Lens can correctly identify objects and items in images most of the time, there are also misses. It seems that the software is not yet good at identifying people’s faces. Google has another application for that called Google Vision. Tying this up with Google Lens will surely be helpful for making more accurate facial recognition. Most all tests I tried had failed. Google Lens was able to identify other things in the image except the person’s face.

Google Lens was not able to identify any of these public figure faces.

There were also times when landmarks like buildings were incorrectly identified. This might be due to the angle or quality of the image. As cameras become more higher resolution and the quality becomes better along with more datasets, the level of accuracy should surely increase on identification.

The building was identified as Centennial Park, which was quite close, but not the correct name of the building. Perhaps this was based on location, but I wanted Google Lens to tell me the name of the building. All attempts of the shot taken from different positions failed to correctly identify the building.

Google has already made a product that uses the Lens feature on a pair of glasses. As wearable, it displays the information right on the glasses lens, no more need to stare on a smartphone display. It is called the Glass Enterprise Edition 2, and it is a limited edition product.

I would recommend this app as a useful knowledge tool. Get information, translation and even suggestions from any image you point the camera to. Do some fact checking though, because some results are not always accurate (like with the building example). For the most part it works, and this is something Android users will benefit from.

How I See Things With Google Lens

Written by VTECH