Sufficiently Advanced Technology: Visual Processing

Andrew Tubley
State of Analytics
Published in
5 min readJan 1, 2017

As 2016 comes to a close I reflect on the many articles in my news feed exclaiming that Machine Learning, AI, Data Scientists, Big Data, Cloud, and <insert Technology here> are poised to take 2017 by storm. But what do these technologies really do? How are they used and what can they be used for? To answer these questions I will be doing a deep dive on the top offerings in different technology areas.

Visual Processing: Getting 1000 words out of a Picture

Visual processing is a key component of may tasks and specific cases such as alphabetic and numeric recognition is effectively solved. The classic example being the MNIST database with some impressive algorithms performing with less than a 1 percent error rate. (https://en.wikipedia.org/wiki/MNIST_database)

But what about a more challenging problem, what about Star-Lord?

Who? So glad you asked. Star-Lord the charismatic leader of the Guardians of the Galaxy, recently played by Chris Pratt in marvel’s movie of the same name, directed by James Gunn. Overall great critical and fan reception, part of the larger Marvel Cinematic Universe. Not so gratuitous cast photo included here.

Guardians of the Galaxy (2014)

So with guardians being such a cultural phenomenon how do the top image visualization tools do with identifying and classifying the image above? I asked Google’s Cloud Vision API, Microsoft’s Computer Vision API, IBM Watson’s Visual Recognition, and Amazon’s Recognition to take a look at the image and tell me what they saw. (Note: All of these services on continually updating and improving their offerings so your milage may vary if you ask them yourself.)

Results Summary:

All of the offerings were able to identify the unforgettable Chris Pratt’s Star-Lord and as much of a scene stealer as he is, it’s disappointing that more characters were not identified. Even accounting for Groot and Rocket being non-standard humanoid faces, the top two apis only identified 3 of 4 human faces for a total of 50 percent of all character faces on display.

Top Differentiators:

  • Google and Microsoft identified 3 of 6 character faces where IBM and Amazon only found 1 of 6.
  • Number of Facial Landmarks identified:
Google    34
Microsoft 27
Amazon 25
IBM 0
  • Microsoft and Amazon have detailed facial hair and glasses/sunglass detection, Google shows only a headgear category.
  • Google’s and Amazon’s labels (subjectively) seem more useful than Microsoft’s listing of multiple people in a photograph. Watson seemed to go off the deep end with label here (literally) with references to diving gear and aqualungs.

Even when combining the best results across all of the services, there still remains quite a bit of information left in the image that was not yet able to be extracted. And while no one answered the question of who is Star-Lord (Hint: there are other APIs for that) they all could tell us where he was in the photo.

Goodbye APIs of 2016, looking (pun intended) forward to what you will become in 2017.

Details for those who like that sort of thing.

Google’s Cloud Vision API

Google identified 3 faces in the photo as shown above.

For each it provided Roll, Tilt, and Pan orientation, with no emotions detected except a hint of anger on Zoe Saldana’s face. All faces included bounding boxes and 34 face landmarks identified. None are blurred and none have headwear.

10 Dominant colors.

Labels:

Musical Theatre 66 percent
Screenshot 60 percent
Comics 59 percent
Mythology 53 percent

Safe search appropriate as Very Unlikely for Adult, Spoof, or Medical content and Unlikely for Violent content.

Microsoft Azure’s Computer Vision API

Microsoft identified two faces in the photo as shown above.

Boundary boxes for faces are provided, but no facial features are included.

Age and genders provided 36 for Chris Pratt and 32 for Zoe Saldana. Pratt’s age aligns closely to his real age of 35 (in 2014), Saldana’s age is off by a few years, but that could be due in part to studio photoshop wizardry for the promotion photo.

Dominant Colors: All return Black with an Astral accent color.

Labels:

Person 99 percent
Outdoor 87 percent
People 57 percent

Categories:

Outdoor

Not adult content. Not Racy content.

Face API

Microsoft’s Face API identified 3 faces as shown above.

With 27 facial landmarks. Head roll, yaw, and pitch.

Ages, genders, facial hair (moustache, beard, sideburns), glasses

IBM Watson’s Visual Recognition

Watson identified only one face in the photo as shown above.

Chris Pratt was identified as Male with an age range of 35–44.

Labels:

Person 72 percent
Aqualung 57 percent
Device 57 percent
Dip 56 percent
Swimming 57 percent
Water sport 59 percent
Sport 60 percent
Skin-diver 56 percent
Traveler 56 percent
Motley dress 55 percent
Fabric 55 percent
Maroon color 79 percent
Ultramarine color 74 percent

Amazon’s Rekognition

Object and Scene detection

No faces identified

Labels over 60 percent confidence:

People 99 percent
Person 99 percent
Human 99 percent
Art 78 percent
Gargoyle 78 percent
Statute 78 percent
Dance 76 percent
Dance Pose 76 percent
Figurine 75 percent
Musical 68 percent
Play 68 percent
Stage 68 percent
Female 68 percent

Facial Analysis

One face detected:

With 25 facial landmarks. Head roll, yaw, and pitch.

No emotions detected with any degree of confidence.

Labels:

Looks like a face 99.9 percent
Appears to be male 99.9 percent
Not smiling 55 percent
Not wearing eyeglasses 99.9 percent
Not wearing sunglasses 99.6 percent
Eyes are open 99.9 percent
Mouth is closed 99.7 percent
Has a mustache 99.7 percent
Has a beard 99.9 percent

--

--

Andrew Tubley
State of Analytics

Technology and Advanced Analytics Professional working as a Solution Principle for Slalom Consulting San Francisco.