Rekognition Image & Video Analysis

Detecting faces, objects, and scenes using Amazon Web Services

Ross Rhodes
Dec 16, 2019 · 6 min read

Ranging from hand written number recognition, to facial identification, to speech analysis, machine learning proves invaluable across an abundance of fields. With a growing community joining Amazon Web Services, the demand for classification algorithms in the cloud continues to grow. So what does AWS offer that we can apply to our applications?

Launched in 2016, Amazon Rekognition allows us to detect objects, compare faces, and moderate images and video for any unsafe content. This is only a few of the many features it delivers. Requiring no training data or any knowledge on your part of the underlying machine learning techniques, Rekognition easily plugs into AWS applications to deliver detailed analysis.

Image for post
Image for post
source: mysmartcave.net

Using an AWS Lambda function running Python 3.8, let’s explore Rekognition’s capabilities to demonstrate its impressive functionality out-the-box. Along the way, we’ll see a picture of my home county, give Rekognition the opportunity to guess my age (what do you think?), and analyse footage of the UK’s only male Giant Panda!

Detecting Objects and Scenes

Taking an old favourite from my own photo collection, we see a fantastic view of the River Tay in Scotland. Captured from the top of Kinnoull Hill, this is situated close to my home city, Perth. Running Rekognition’s label detection against this, let’s see what it delivers with confidence matching 70% or more.

Image for post
Image for post

Extracting a subset of the response for easier viewing, we see the results are ordered automatically to show labels with the greatest confidence first. Rekognition detects scenery, nature, and outdoors with a tremendous amount of certainty. Furthermore, it identifies water, but perhaps more surprisingly, it fails to detect either the road or the turret with enough confidence.

Lowering the threshold to 50%, labels for both the road and the turret emerge amongst other labels with smaller degrees of certainty.

No training was required on our part to generate these labels. Rekognition runs this analysis using its own training data, which includes previous user data. This is permitted under the current data privacy policy — a policy which you may opt-out from by reaching out to AWS Support.

Clearly Rekognition’s performance will improve over time as its training data grows. Revisiting this analysis in a year, I wouldn’t be surprised to see the road and turret labels returned with greater confidence. The labels alone are an excellent feature, but we can go one step further — analysing faces also.

Facial Recognition

Let’s utilise two of my profile pictures — one from two years ago (to the left), and another from earlier this year (to the right). Focusing solely on the latter first, let’s see what detect_faces returns.

Image for post
Image for post
Image for post
Image for post

Stripping down the response for easier reading, we see Rekognition attempts to guess my age — concluding I’m between 24 and 38 years old. Indeed, I was 24 at this time this was taken, but it’s a wide margin of error. This leaves me somewhat offended it thinks I may be in my late thirties!

Rekognition also analyses my face for emotions conveyed. It’s pretty sure I’m calm, but doesn’t detect any other expressions. Disgust, fear, and surprise are all confidently discarded. A reasonable conclusion there and then, but I’d maybe present a little more surprise after Rekognition’s age estimate!

Now to let Rekognition loose on both pictures. As well as analysing individual faces, Rekognition can compare faces between two pictures, and establish with some confidence the similarity between them. Using Boto 3’s compare_faces method, we provide a source and a target image for analysis.

Returning a similarity score of 99.85%, Rekognition is positive these two faces are the same! Indeed, it’s right to. This comes as no surprise — the distinctive chin is always a giveaway. Now I’m tempted to test its confidence against a series of animal pictures. An idea for the next blog post …

Moving away from my own pictures, Rekognition can also identify celebrities, detect unsafe content, and analyse images for text. A fantastic range of features for static content, but this service also sells itself on its video analysis.

Video Analysis

We can configure the SNS topic to communicate results by email, SMS, or any other supported SNS protocols. Alternatively we can setup this topic to trigger a Lambda function, permitting us to automatically process the results.

To run object and scene labelling against a video, we execute Boto 3’s start_label_detection request. Unlike images, we must host this video in an S3 bucket — limiting the size of our video to 5TB or less. We also need to provide a confidence threshold, an SNS topic for the outcome, and an Identity Access Management (IAM) role to grant Rekognition required permissions to notify this topic.

Let’s take this analysis for a spin with an old film of mine, featuring a Giant Panda called Yang Guang (“Sunshine”), who hit the headlines when he arrived here in Edinburgh along with Tian Tian (“Sweetie”) in December 2011. The pair are on loan to Edinburgh Zoo from China until 2021, and I couldn’t resist a visit to see them both shortly after their arrival!

Configuring the same Lambda function to process the response, there was a 35 second interval between making the start_label_detection request and the Lambda executing again. During that time Rekognition analysed the video behind the scenes.

As ever, a vast amount of information is returned to us, including labels over different points of time on the video. Breaking down the response so that we focus only on the frame at the 15th second, we have a handful of labels to process.

The top four labels alone make for interesting viewing: Rekognition is sure it sees a Giant Panda, which is great, but it also seems certain there’s a bird in the frame. None that I can see! If we were to study the results in full, we’d be able to pin where in the frame it thinks it sees this bird.

Conclusion

There’s certainly room for improvement: failing to detect with enough confidence some of the key properties in my River Tay picture, as well as detecting phantom birds in my Giant Panda footage! Nonetheless, it’s done a great job classifying the majority of objects and facial attributes correctly.

I’ll remain somewhat bitter on the age estimate, but it’s been a wonderful tool to play with. I’m sure in time Rekognition will continue to improve as more users apply it to their applications. If you fancy giving this service a try, you’re welcome to fork my GitLab repository, where I’ve shared the Lambda function and other infrastructure used to demonstrate these features.

For any questions, contact AVM Consulting: blog@avmconsulting.net.

AVM Consulting Blog

AVM Consulting — Clear strategy for your cloud

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store