Rekognition Image & Video Analysis
Ranging from hand written number recognition, to facial identification, to speech analysis, machine learning proves invaluable across an abundance of fields. With a growing community joining Amazon Web Services, the demand for classification algorithms in the cloud continues to grow. So what does AWS offer that we can apply to our applications?
Launched in 2016, Amazon Rekognition allows us to detect objects, compare faces, and moderate images and video for any unsafe content. This is only a few of the many features it delivers. Requiring no training data or any knowledge on your part of the underlying machine learning techniques, Rekognition easily plugs into AWS applications to deliver detailed analysis.
Using an AWS Lambda function running Python 3.8, let’s explore Rekognition’s capabilities to demonstrate its impressive functionality out-the-box. Along the way, we’ll see a picture of my home county, give Rekognition the opportunity to guess my age (what do you think?), and analyse footage of the UK’s only male Giant Panda!
Detecting Objects and Scenes
Rekognition is able to identify objects and scenes within your images, returning a confidence metric between 0 and 100% for each label (or tag) returned. Working with AWS’ Python client Boto 3, we run the
detect_labels operation to do this. Providing an image, a cap on the number of labels to return, and a confidence threshold to filter out uncertain results, Rekognition synchronously performs this analysis. Images can either be provided directly in bytes or referenced from an Amazon S3 bucket, as we see below.
Taking an old favourite from my own photo collection, we see a fantastic view of the River Tay in Scotland. Captured from the top of Kinnoull Hill, this is situated close to my home city, Perth. Running Rekognition’s label detection against this, let’s see what it delivers with confidence matching 70% or more.
Extracting a subset of the response for easier viewing, we see the results are ordered automatically to show labels with the greatest confidence first. Rekognition detects scenery, nature, and outdoors with a tremendous amount of certainty. Furthermore, it identifies water, but perhaps more surprisingly, it fails to detect either the road or the turret with enough confidence.
Lowering the threshold to 50%, labels for both the road and the turret emerge amongst other labels with smaller degrees of certainty.
Clearly Rekognition’s performance will improve over time as its training data grows. Revisiting this analysis in a year, I wouldn’t be surprised to see the road and turret labels returned with greater confidence. The labels alone are an excellent feature, but we can go one step further — analysing faces also.
Similar to labels, Rekognition supports facial analysis using Boto 3’s
detect_faces operation. Providing the image either in bytes or from S3 as before, Rekognition can be configured to return all supported attributes or only the default properties, chosen by AWS.
Let’s utilise two of my profile pictures — one from two years ago (to the left), and another from earlier this year (to the right). Focusing solely on the latter first, let’s see what
Stripping down the response for easier reading, we see Rekognition attempts to guess my age — concluding I’m between 24 and 38 years old. Indeed, I was 24 at this time this was taken, but it’s a wide margin of error. This leaves me somewhat offended it thinks I may be in my late thirties!
Rekognition also analyses my face for emotions conveyed. It’s pretty sure I’m calm, but doesn’t detect any other expressions. Disgust, fear, and surprise are all confidently discarded. A reasonable conclusion there and then, but I’d maybe present a little more surprise after Rekognition’s age estimate!
Now to let Rekognition loose on both pictures. As well as analysing individual faces, Rekognition can compare faces between two pictures, and establish with some confidence the similarity between them. Using Boto 3’s
compare_faces method, we provide a source and a target image for analysis.
Returning a similarity score of 99.85%, Rekognition is positive these two faces are the same! Indeed, it’s right to. This comes as no surprise — the distinctive chin is always a giveaway. Now I’m tempted to test its confidence against a series of animal pictures. An idea for the next blog post …
Moving away from my own pictures, Rekognition can also identify celebrities, detect unsafe content, and analyse images for text. A fantastic range of features for static content, but this service also sells itself on its video analysis.
Taking a different approach to video, Rekognition supports asynchronous analysis. When a successful request is made using the AWS client, a job ID is returned for future reference. Whenever the analysis completes, Rekognition notifies a Simple Notification Service (SNS) topic of the outcome.
We can configure the SNS topic to communicate results by email, SMS, or any other supported SNS protocols. Alternatively we can setup this topic to trigger a Lambda function, permitting us to automatically process the results.
To run object and scene labelling against a video, we execute Boto 3’s
start_label_detection request. Unlike images, we must host this video in an S3 bucket — limiting the size of our video to 5TB or less. We also need to provide a confidence threshold, an SNS topic for the outcome, and an Identity Access Management (IAM) role to grant Rekognition required permissions to notify this topic.
Let’s take this analysis for a spin with an old film of mine, featuring a Giant Panda called Yang Guang (“Sunshine”), who hit the headlines when he arrived here in Edinburgh along with Tian Tian (“Sweetie”) in December 2011. The pair are on loan to Edinburgh Zoo from China until 2021, and I couldn’t resist a visit to see them both shortly after their arrival!
Configuring the same Lambda function to process the response, there was a 35 second interval between making the
start_label_detection request and the Lambda executing again. During that time Rekognition analysed the video behind the scenes.
As ever, a vast amount of information is returned to us, including labels over different points of time on the video. Breaking down the response so that we focus only on the frame at the 15th second, we have a handful of labels to process.
The top four labels alone make for interesting viewing: Rekognition is sure it sees a Giant Panda, which is great, but it also seems certain there’s a bird in the frame. None that I can see! If we were to study the results in full, we’d be able to pin where in the frame it thinks it sees this bird.
A relatively mature service from AWS, Rekognition is easy to integrate into other services like Lambda. Providing huge volumes of detail in its analysis, I’m impressed by how much it returns from the material shared here.
There’s certainly room for improvement: failing to detect with enough confidence some of the key properties in my River Tay picture, as well as detecting phantom birds in my Giant Panda footage! Nonetheless, it’s done a great job classifying the majority of objects and facial attributes correctly.
I’ll remain somewhat bitter on the age estimate, but it’s been a wonderful tool to play with. I’m sure in time Rekognition will continue to improve as more users apply it to their applications. If you fancy giving this service a try, you’re welcome to fork my GitLab repository, where I’ve shared the Lambda function and other infrastructure used to demonstrate these features.
For any questions, contact AVM Consulting: email@example.com.