Self-Checkout Web App using TensorFlow Object Detection API

Giorgos Aniftos
Systems AI
Published in
7 min readMay 10, 2019

Written by Giorgos Aniftos and Nnamdi Affia

In this article, we will walk you through the steps on how we built a self-checkout application for detecting and counting fruits. For this project, we used Tensorlow to build our model, and a combination of ReactJS and Flask microframework to build the app. Finally, we containerized it using Docker, so it can be run on an edge device.

WHY DID WE DO THIS?

You’re probably thinking; “the use of self-service checkout machines is quite straightforward”. That is true, yet, a gap exists between efficiency and security despite its ease-of-use. As its name implies, it is a quicker alternative to purchasing items at busy supermarkets than standing in a 7-person queue with a basket of only 4 items. A typical user will walk up to the self-checkout station, pass each item they wish to purchase through the scanner (which retrieves the item’s information contained in the barcode/serial number) unto the weight scale to tally them up and finally pay the total cost of the items. However, scanning is only available to items with barcodes on them (primarily, items within packaging).

In the instance of purchasing fruits and vegetables without packaging, the user would have to manually search for the specific item on the system’s screen and enter the quantity of said item before placing it on the scale.

This is where two opportunities become apparent.

  • EFFICIENCY: users may find manually searching for fruit and vegetables to be tedious, particularly if feeling rushed or commuting.
  • SECURITY: unfortunately, self-service checkout machines rely on ethical usage by customers. Although most supermarkets now have security staff available on premise to ensure shoplifting doesn’t occur, what stops a teenager from placing a PlayStation4 on the scale but registering it as an £8 bag of fruits? Technically, the items have been paid for, however, it was not done ethically and thus counts as theft. How can such issues be accounted for in the future?

We didn’t build this app to solve the issues mentioned above, we just needed an excuse to play around with TensorFlow and build something cool!!

CONCEPT

The application can detect, count, appropriately price oranges, apples, and bananas and finally add them to a shopping list. The idea is simple, we train a model using the TensorFlow Object Detection API and build a web application using Flask and ReactJS.

To be more specific, we took a pre-trained model and fed it with a small labeled image dataset of the 3 fruits that we created then trained it. The advantage of using a pre-trained model rather than training one from scratch is that the weights are already tuned. That gives you the ability to work with smaller datasets.

Secondly, we developed an inference application using Flask (backend) and ReactJS (frontend). Basically, the ReactJS app uses the system’s camera to capture an image and send it via API to the Flask backend, where the model is hosted. After the prediction is made it returns the results to the frontend, to print it on the User Interface.

These are the basic steps that were followed:

  1. Data preparation
  2. Train a model using transfer learning
  3. Build the app backend using Flask microframework
  4. Build the app frontend (UI) using ReactJS
  5. Containerize the app using Docker

The image below shows a diagram of the implementation process:

Development workflow

TRANSFER LEARNING PROCESS

Data collection and processing

Data collection involved gathering images of bananas, apples, and oranges (100 each). We labeled them using the PowerAI Vision software. PowerAI Vision is a tool for end-to-end deep learning model development for computer vision projects such as image classification and object detection.

Typically, TensorFlow requires TFRecords file type as its default input for training. By exporting the labeled dataset from PowerAI Vision, we have the images alongside the XML files which includes the annotation boxes’ coordinates and the corresponding labels of each box. Converting the images and XML files to TFRecords was done in a two-step process. The first step is to convert the XML to CSV and finally to TFRecords. A detailed guide containing python scripts for converting data formats well as a TensorFlow workspace guide are all available here.

Model training

Transfer learning was used to train our model. As mentioned earlier with transfer learning, instead of starting the learning process from scratch, you start from a point where your model has already learnt some patterns by training on other datasets to solve different problems. So basically, what happens in transfer learning is that you take an already pre-trained model, adjust the output layer to match the number of labels you have (3 in our case) and feed them your new dataset.

For our pre-trained model, we chose the Faster RCNN Inception v2 trained on the coco dataset, from TensorFlow’s model Zoo. We trained the model using the IBM POWER8 server containing Tesla K80 GPUs.

The model was trained on a training dataset of 210 images and testing dataset of 90 images (70% — 30%). The model was trained for a maximum of 11K iterations and achieved a total loss of 0.05. As you can see from the loss graph, the loss remains almost stable after the 5000th iteration.

Total loss from training the model after 11k iterations (Source: TensorBoard graph visualization)

During the training, TensorFlow saves checkpoints (.chkp) at a specified number of iterations. To use the trained model for inferencing, you need a single file that includes the model graph alongside the weights (protobuff file). For that reason, TensorFlow provides a python file to export a protobuff inference model using the checkpoints files. After the model is exported, it will be included in the backend to be used for detecting and counting the fruits.

WEP APP DEVELOPMENT

The applications consist of two parts. The front-end which was build using ReactJS and the back-end using the Flask library in python.

Flask is a microframework that can be used for web development in Python. Using Flask for building the backend is optional as you can use only ReactJS to build the whole app, but we wanted to make our lives more complicated and difficult.

The backend is essentially the brain of the app, as it hosts the TensorFlow model for predicting the fruits, counting them and assigning the pricing. A tutorial on deploying a TensorFlow model with Python and Flask can be found here.

For the frontend, ReactJS was used to design a simple User Interface so it looks like a self-checkout display. ReactJS is a JavaScript library for building interactive user interfaces.

How it works

A picture is taken using the pc webcam which is integrated into the ReactJS (UI) and send through an API to the backend for some logic. The picture is passing through the model and it outputs the corresponding fruits. Then using a dictionary data structure in python, the price of each fruit is printed. Each time a picture is taken it performs the same logic and if a fruit is detected, it’s price and quantity are added to the shopping list.

CONTAINERIZING AND DEPLOYING THE APP USING DOCKER

After having the application running, we chose to containerize it using Docker. By building your application to run in a docker container, an image of that container can be easily run on any system that supports Docker. This is extremely useful because you don’t have to worry about various libraries and dependencies your application needs to run successfully.

There are 2 things that should be known about Docker. The Docker image and Docker container. You can imagine a docker image as a virtual disk having all the information inside (operating system, libraries, application) and a container as where your image is running. In other words, you need an image to create and run a container.

The most common approach for building a Docker image is to by using a Dockerfile. Docker can build images automatically by reading the instructions from the Dockerfile. Dockerfile is a text file that contains all the commands that you will use in the command line of your operating systems to install the dependencies and run your application. Docker can build images automatically by reading the instructions from the Dockerfile. There are many tutorials available to learn about Docker. Here is a convenient cheat-sheet. Other useful information can be found on the official Docker website.

RUN THE APP

Now the fun part!!

You can find all the instructions to run the self-checkout web app in this GitHub repository.

Alternatively, you can use a docker container. Assuming that you Docker is already installed in your pc run the following commands

docker pull ganiftos/self-checkoutdocker run -d — name selfcheckout -p 3000:3000 -p 5000:5000 ganiftos/self-checkout:1.0docker exec -it selfcheckout python3 usr/src/app/backend_tf/app.py

Enjoy!!

Developed by Giorgos Aniftos (Client Technical Specialist), Nnamdi Affia (AI Specialist) and Tom Farrand (Machine Learning Engineer) at IBM

--

--