Can machines really see my microwave?

Artificial Intelligence has been a buzzword for a few years. However, behind the scenes, it has been changing the world for the better. There is almost no area in which it is not applicable and the number of use cases is countless. WeGroup has already invested in AI to handle damage claims for our clients. Part of the process is automated to ensure our customer’s needs are satisfied as quickly as possible and our experts can use the saved time to take care of the critical cases.

We are expanding our AI system to improve the customer’s experience even more. The ideal case would be to estimate the value of your belongings if damage occurred. This way, if the claim would be accepted, the amount could be paid out instantly. We believe we have found that solution, which makes sure that your most precious property is protected the way it should be. How? With a device you keep close at all times, your smartphone. We are actively developing an application to register your valuable belongings with your smartphone camera, so that is capable of identifying the object you love, it’s state and value.

The first step is to identify what’s on the picture. There is a specific branch within AI that focuses on this task, i.e. Image Recognition. To save precious time and effort, we will integrate existing platforms into our application. To see which platform suits us best, we’ve done a thorough investigation with the results discussed below.

Our goal was to analyze the performance of the different platforms. The criteria are accuracy, ease-of-use and cost. We are examining Cloud Vision, Rekognition and TensorFlow (algorithm based on ImageNet), which are the platforms of the big players (Google, Amazon). Besides those three, we’ll also have a look at Clarifai and CamFind (Android app).

Accuracy

To test the accuracy we put together a small set of pictures that a user might have taken. Below are the 3 pictures that we will feed to the Image Recognition platforms. The desired recognized objects are: 1) microwave, 2) kitchen and fridge, 3) desk, screen and laptop.

Microwave — Kitchen — Desk

The overall accuracy is comparable and quite good. The result of CamFind is very specific in comparison to the others and only returns one object per picture. Although a detailed description might be useful to determine the value of objects, it is not essential in this phase of the application.

Most of the platforms include a probability or measure of confidence in their results. The way they define this number is specific to the platform and is thus difficult to compare.

Ease of use

Cloud Vision, Rekognition and Clarifai were able to detect the most prominent objects in the pictures. In the original results, there was a lot of additional information which we would need to filter out. The platforms don’t only return objects but are capable of understanding other concepts as well, e.g. whether there is a person in the picture, or if the picture was taken indoor or outdoor. All three platforms offer an online API, which would be easy to integrate with our application.

We used TensorFlow in combination with the Inception-v3 model. Inception-v3 is trained for the ImageNet Large Visual Recognition Challenge using the data from 2012. This is a standard task in computer vision, where models try to classify entire images into 1000 classes, like “Zebra”, “Dalmatian”, and “Dishwasher”. Because it is trained on “only” 1000 classes, TensorFlow is definitely not capable of detecting every kind of object. But if some unsupported object would be important to our clients, we could retrain the model ourselves.

The way TensorFlow returns the results is different from the others. TensorFlow always gives back 5 classes with the highest probability, the others return as many concepts as possible, as long as they are relevant.

CamFind has the advantage that you can take the picture with the app and analyze it immediately. The CamFind app returns only one very specific result. If we want our user to be able to take a picture of multiple objects together, the result of CamFind won’t be sufficient.

Cost

Cloud Vision, Rekognition and Clarifai are online services with a usage-based pricing. The first thousands requests are usually free and afterwards the typical cost is 1$ for 1000 requests. This is not overly expensive, but as our customer base grows, we cannot neglect this cost. TensorFlow, on the other hand, is an open-source software package and thus free to use.

Conclusion

We’d like to combine the best of both worlds in our application. We want our users to take pictures with their smartphone and upload them immediately like in CamFind. In our backend services, however, we would like to get a more versatile result. The good thing about TensorFlow is that it focuses on objects and that we can run it ourselves. In case our model fails to detect some type of object, we could use Cloud Vision or Rekognition as a fallback.