Can Vision AI detect Game of Thrones Coffee Cup

In Game of Thrones, season #8 episode #4 Starbuck coffee cup played a cameo role in the post war feast. I was wondering how someone spotted this, when you are immersed in the episode itself. It’s no longer available in streaming services, HBO has removed it silently. On other hand I was thinking whether an AI pre-built models like Vision APIs can detect the coffee cup in the frame? Let’s test it out.

Why I’m doing this? this is only for fun and not to evaluate or to compare any of these services. I’ll be using Microsoft Cognitive Services Vision API, Google Vision AI API and Clarifai Predict API. I’m not going to train any of these models and will be using only their demo feature. Not able to find demo of Amazon’s Rekognition API, so that’ll be missing in this post. I uploaded the below image from social media in to these demo API sites and captured their predictions.

Original image used in these services
I’m not expecting image to be analyzed for the presence of brand “Starbucks” but anything like “Coffee” would be really great or at least a “Cup”.

Let’s start with Microsoft Vision API demo, the prediction was pretty decent. It detected the human faces and tagged almost all the objects except the coffee, but it detected pizza not sure where it came from. You can also note, one of the object detected is “Glasses”.

Microsoft Cognitive Vision API demo

Next, It’s Clarifai’s predict API demo time. Again impressed with the prediction, most of the objects matched with previous one. It also predicted a relationship as “son” which is interesting but not exactly right in this context.

Clarifai’s Predict API demo

Finally, it’s Google Vision API demo time. Nothing much on the objects, but it detected the humans in the picture. Real awesomeness comes with its web entities match, which was able to match most of the keywords right. It’s a real added advantage to have this feature and to combine the results with web entity match.

If I have to pick any of these services, I’ll prefer Microsoft service over others for level of details provided with objects in this context. But I said earlier, I was fiddling around with the demo service APIs as is without any prior training on these models. Based on trainings these services may outrun each other.