How machine learning helps visually impaired people with daily shopping

Personal Shelf Inspector — a digital shopping assistant for visually impaired people

Tibor Mach
DataSentics
4 min readJul 13, 2021

--

Physical shopping in a store without assistance can be a very stressful and time-consuming task for visually impaired people. In DataSentics we have been developing an ML-driven mobile app called Personal Shelf Inspector to make their shopping faster and simpler by identifying the items in the display racks based on the listed price tags.

About a year ago, we in DataSentics took part in the AI for Accessibility hackathon run by Microsoft, which aims to use machine learning to help people with disabilities. We made use of our experience with segmentation and object detection from our Shelf Inspector product to create a proof of concept for a mobile app to help the visually impaired people. Encouraged by being awarded the first prize in the hackathon, we decided to develop the app further into a usable product and make it available to everyone for free.

In the following, I will describe the concept and the design of the Personal Shelf Inspector. Finally, I will briefly mention the current state of development of the app as well as its outlooks.

Design and user experience

The app user interface is in essence very simple. The user only needs to point their smartphone camera at a store rack and take a picture. The app then scans the picture for shelves with price tags, detects all price tags on each shelf and reads the product name and price with an OCR. Since the other information from the price tag is not as relevant to the user, it is ignored to save the user’s time.

Moving from top to bottom of the picture it then summarizes the contents of each shelf to the user using the built-in text-to-speech assistance. This way, the user can get a quick overview of the contents of the rack. The app then gives the user the option to select a specific shelf and obtain the product name and price of each item on that shelf. This can save the user a significant amount of time otherwise spent looking for the products by manually going through the display rack.

A Photo showing two shelves with chocolates and their pricetags, with pricetags marked by rectangles as identified by the object detection algorithm.
Object detection of price tags in shelves, all identified price tags are marked by the green rectangles (the numbers above the rectangles show the models’ confidence in a particular detection)
A photo showing all information an OCR can find in a pricetag
An OCR can read a lot of text from a price tag, but most of it is not particularly relevant to the user...
A photo showing the same pricetag as before with name and price marked by rectangles.
…which is why we use a detection algorithm to filter out everything except for product names and prices.

Personal Shelf Inspector vs Google Lookout

One app that will likely come to mind when reading about Personal Shelf Inspector is Google’s Lookout app. Lookout is essentially a real-time object detector coupled with an OCR which is capable of describing many objects and reading most texts a mobile phone camera is pointed at.

While Lookout contains a mode designed for shopping, it is based on identifying specific products and packages (similar to our commercial Shelf Inspector product). Unfortunately, this is not quite as transferable to a market where Google has not meticulously trained their algorithm on local products and the shopping mode of Lookout is only available in a handful of countries. It is also unable to recognize new products or packaging introduced to the market before Google trains the underlying model to recognize these new items. Relying on product packaging also means that Lookout is unable to find the price of the product in the specific store at the specific time the user is shopping.

Because of this, we have decided not to copy Google and instead go with an alternative approach to product identification based on price tags rather than products themselves. Our approach can be much more readily applied to new markets as well as new products/packages introduced to an existing market. It also gives users information about the current prices of all identified products.

How it works

The backend of Personal Shelf Inspector consists mostly of two machine learning components:

  1. An object detection algorithm that identifies individual shelves, price tags and the most relevant parts of the price tags — product names and prices.
  2. An OCR reads the output of the object detection neural network (i.e. the name and price of each price tag) paired with an NLP algorithm to format the text output correctly for the built-in speech-to-text which then reads the content to the user.
A hand holidng a mobile phone showing a screen with text extracted from a pricetag and a picture of the pricetag below the text.
Personal Shelf Inspector app in its web app prototype version. The text on the screen is read to the user by the text-to-speech service of the smartphone

Further development

Throughout its development, Personal Shelf Inspector has gone through several implementation phases. It has been prototyped as a web app (in Python) and is currently being deployed to run natively on mobile devices (Android as well as iOS). We have successfully converted our object detection algorithm from its original Python implementation and tested it on mobile devices.

Once we fully integrate our algorithms into a native mobile app, we will test it to provide the best possible user experience. We expect to have an MVP version of the mobile app ready and available for download by early 2022. To keep track of the ongoing development you can follow us on DataSentics LinkedIn or Facebook. We will greatly appreciate any comments, questions or feedback.

--

--