Optimizing Media Asset Management with Facial Recognition and Machine Learning

Andrew Zaikin
firstlineoutsourcing
4 min readMay 29, 2023

Face recognition technology is becoming a practical tool in post-production for the media sector, bringing significant improvements in video content processing.

Imagine you’re working on editing a sports broadcast. With this technology, you can quickly scan and recognize the faces of players, then automatically tag them in every frame. This simplifies the editing process, especially when working with a large volume of video footage. It also facilitates the creation of timelines or specific clips featuring a certain athlete. Facial recognition also aids in creating more accurate metadata for your media library, simplifying future content search and selection. Overall, this technology offers new possibilities for more efficient management and personalization of content.

It can be challenging to come across Usain Bolt on a timeline due to his remarkable speed.

Assume you wish to incorporate face recognition into your routine content library management. How can this be achieved? Our client himself posed this query to me. To answer this question we have to discover what process they have now and what options we have for implementing.

At that time, this work was done manually by editors and content managers. For each video from the tournament and championship, they mark video with meta tags in the media asset management system(MAM). This process was started not so long ago. It means the biggest part of their media archive wasn’t marked. It’s a huge amount of work to handle 25k video files and 250k photos.

How I see the task:

  • train a neural network for specific faces and be able to keep learning process with new faces
  • process all the media content in an archive
  • incoming content should be processed on the fly
  • editors and content managers should be able to find clips with recognized players easily with the MAM search

All the content is stored in cloud storage. Let’s check what we have for facial recognition solutions that API can integrate with their MAM:

  • Amazon Rekognition: This service offers a powerful set of tools for image and video analysis, including facial recognition. The API allows you to identify faces, analyze emotions, determine age, and other attributes.
  • Microsoft Azure Face API: This is a service that offers facial recognition features, including face identification, face detection, emotion analysis, and other capabilities.
  • Google Cloud Vision API: This API provides a set of features for image analysis, including facial recognition. It can detect faces in images and analyze emotions.
  • IBM Watson Visual Recognition: IBM also offers an image analysis service, which includes facial recognition features. This API can detect faces in images and analyze their attributes.
  • Kairos: This is a specialized facial recognition service that offers an API for detecting and identifying faces, emotion analysis, and other functions.

Most of them work just with images. We must split the video into frames and then push it to this cloud service. In this case, the price depends on the number of transactions(frames). It’s too complex and expensive for integration, and our client wasn’t ready for this option.

We chose Amazon Rekognition because it works with video and images, the pricing model is transparent, and all the media content is stored in AWS S3, which means no egress fee.

We used Serverless architecture for the AWS solution with TypeScript and Node.js. One of the benefits of the Event-Driven approach is its ability to save money. This is because services are not dependent on each other and can function smoothly in asynchronous mode without incurring any downtime costs. Additionally, the approach is not complex to implement, freeing up time to focus on the main goal rather than routine coding.

Simplified Serverless architecture of the solution

After implementation, users can:

  • Intuitively teach the neural networks new faces by clear and simple process directly from MAM
  • Start the face recognition process for selected assets or folders of existing media archive
  • Search assets by meta tags with known persons
  • Select a marked sub-clip for each found person in the selected asset and take other possible actions with this sub-clip
  • See and use face location frames on playback in MAM

As a result, it reduced the need for manual tag input to the existing media archive, saving considerable time and resources. Interactive features and ease of finding and working with specific clips improved the overall experience for editors and content managers and increased work effectiveness multiple times. The use of Amazon Rekognition and the Serverless architecture reduced costs associated with handling a large number of frames in video files manually and downtime.

--

--

Andrew Zaikin
firstlineoutsourcing

Founder & CEO at First Line Outsourcing https://flo.team | Mobile, Web and SaaS Development for Media Production, eGames & Tech | Adobe Technology Partner