Why our image recognition AI is better than Google’s, Amazon’s, Microsoft’s etc…

Published in

imageintelligence

5 min readSep 4, 2017

…for security.

Our image recognition AI (Image Intelligence), is better than Google Cloud Vision, AWS Rekognition and Microsoft Azure Computer Vision, for security use-cases, to be specific.

While I do apologise for the clickbait title, even making a relatively narrow claim like this would be surprising to many. “How can a small AI startup ever compete with a company like Google, who is willing to buy up all the AI talent?”

Before we get into the reasons, it would be best to substantiate these claims. Thankfully, the aforementioned companies all offer MLaaS/AIaaS (machine learning as a service or AI as a service) via an API. We can easily benchmark the performance of each provider using the same set of images that we have the ground truth for¹. The test set contains 8,000 images — footage from actual security cameras.

Classifying a given frame with Person or No Personis a common security use-case. In this experiment, we use each of the image recognition platforms to detect labels for the 8,000 test images. Some platforms might produce labels that are similar in meaning to Person. These labels are also treated as ‘positives’ should they occur.

Labels that are also considered equivalent to ‘person`.

Accuracy Charts

The accuracy represents the percentage of images classified correctly out of the total set. The threshold represents the ‘cut-off’ point at which a resulting probability score should be considered positive or negative. For example if the threshold is at 0.5, any score less than 0.5 will be treated as negative (no person) and any score greater than or equal to 0.5 will be treated as positive (person).

Accuracy for Image Intelligence peaks at 88.8%

Accuracy for Image Intelligence peaks at 89.5%

Results summary

As you can see, Image Intelligence was considerably more accurate (close to 90% accuracy) than the others at predicting if an image contains a person or not. Some of the others had accuracy levels that were barely over 50%. This pretty much rules out their viability for this use-case as you can get similar accuracy results from tossing a coin (without having to pay a dime).

Broken lines

You might be wondering why the purple line (Image Intelligence) is the only continuous line in the above charts. This is because we always return the probability of an image containing a person from 0% to 100%, whereas the other services do not. For example, Google and AWS do not return any results with a probability less than 50%. There are other implications for this, which we will discuss in a future article.

Why we performed the best

Subject matter expertise

This benchmarking experiment is definitely biased in our favour since we specialise in classifying whether an image contains a person or not. The other platforms on the other hand:

Do not focus on any specific labels and their corresponding accuracy
Do not specify the list of possible labels
Do not tend to prescribe a specific use case

The use of these MLaaS APIs is therefore restricted to non mission critical use-cases, where mislabelled results can be largely tolerated. That being said, accuracy is still required to empower the use case. For example, if you use an MLaaS API to catalog your image library and it produces lots of irrelevant labels, you will still have to resort to manually scanning through all images to look for what you’re after.

A “vertical” AI provider on the other hand, provides products that solve specific industry problems. Image Intelligence has its roots in the security industry and has deep subject matter expertise. We know the pain points that need solving (e.g. reducing false positives) and the constraints that come with the industry (e.g. we’ve spent years dealing with the nuances of IP cameras). With that, we can build better full-stack solutions for the problem space — better APIs, internal tools, and neural network architectures.

Data expertise

In a similar vein to having subject matter expertise is having data expertise. In other words, our neural nets are trained on very large datasets derived from security cameras. Generic or non-specialised computer vision providers are not as well equiped to handle production security footage because they have not ‘experienced’ such data.

Our focus on key labels such as person also means that we spend lots of time annotating our dataset with these labels to produce very accurate results. The more labelled data you have, the more accurate your deep learning models become. Based on the comparison of our accuracy results vs the competition, it is possible/probable that we have the largest person annotated dataset.

Conclusion

AI is showing lots of promise in solving many real world problems. While there are now a handful of MLaaS platforms by the big tech giants, your mileage will vary depending on your use-case. Often a more tailored solution is required to solve your specific problem. Tailoring a solution however requires deep learning expertise plus an extremely large set of annotated data.

Image Intelligence is focused on applying AI to solve problems in the security space. If you’d like discuss how we could help you create a better experience for your customers, please reach out to us at: hello@imageintelligence.com

We’re also hiring! Email us if you’d like to join our team in Sydney. Our engineers build lots of ML tools and pipelines, and our data scientists play with large swathes of annotated image sets everyday.

[1] These images have not been used to train our own deep learning models to maintain the integrity of the experiment.