Real-Time Tag Detection for Live Videos Using AWS Rekognition using Android app

Karthikeyan Balusamy
Nggawe Nirman Tech Blog
4 min readDec 30, 2019

--

Nowadays there is a huge video data collection present on the internet. To make videos search efficient we need to add tags to search it efficiently. Tags are used as metadata for your video’s discovery and multiple other purposes. Few of them are

Usecases

  • It can help media companies to quickly summarize and organize large video catalogs.
  • It can help to improve video recommendations, as it enables the search engines to consider video content, beyond the video metadata.
  • It can help in video surveillance, where many hours of videos must be searched for a specific event.
  • Moreover, Internet platforms, such as YouTube and Facebook, would be able to automatically identify and remove videos with illegal content.

But there is a problem …

Manual tagging is cumbersome and you can’t trust Users to enter valid tags related to the video, users can enter the trending tags just to get more impressions.

How can we solve this problem?

We can automate video tagging and make application smart enough to process the user’s video and return the most related tags accordingly.

I did a POC around this at Nirman, by making an android application to do real-time tag detection. I am sharing my learnings below.

Architecture

How does it work?

A simple approach is to treat video frames as a sequence of still images and apply ML algorithms to recognize each frame and then average the predictions at the video level.

Here we are going to see an application of label detection from a live video that is captured through an android app. It can be achieved through the AWS Rekognition service. Frames will be taken at the frequent intervals (Say for each sec) from a live video and frame will be analyzed by AWS Rekognition Service asynchronously and response tags will be displayed immediately in android.

Note : Processing all video frames is computationally inefficient even for short video clips, since each video might contain thousands of frames(consider 24 frames per second for videos). Moreover, consecutive video frames significantly overlap with each other in content and not all frames are consistent with the overall story of the video.

Prerequisites

Below are the pre-requisites in order to set up for Label detection from a live video.

1. Need to install Android Studio according to Operation System.
2. Signup for Amazon AWS Account and should have full access to Rekognition service.

Setting Up the Application

  • Clone the git repository
$ git clone git@gitlab.com:nirman-tech/Video-Recorder-with-Frames-Analysis.git
  • Open the project in android studio.
Project Structure
  • Add dependencies aws-android-sdk-rekognition, opencv and ffmpeg to Gradle file path /app/build.gradle .
implementation 'com.amazonaws:aws-android-sdk-rekognition:2.10.0'implementation group: 'org.bytedeco', name: 'javacv', version: '1.3.2'implementation group: 'org.bytedeco.javacpp-presets', name: 'opencv', version: '3.2.0-1.3', classifier: 'android-arm'implementation group: 'org.bytedeco.javacpp-presets', name: 'opencv', version: '3.2.0-1.3', classifier: 'android-x86'implementation group: 'org.bytedeco.javacpp-presets', name: 'ffmpeg', version: '3.2.1-1.3', classifier: 'android-arm'implementation group: 'org.bytedeco.javacpp-presets', name: 'ffmpeg', version: '3.2.1-1.3', classifier: 'android-x86'
  • Configure AWS account, Provide accessKey and secretKey to the file path /app/src/main/java/nirman/io/detector/AwsConfig.java. If you don't know how to generate the accessKey and secretKey means refer to this link.
AwsConfig file
  • Then add the framesAnalysis value for analyzing the video frames at a certain time interval of frames in the file path /app/src/main/java/nirman/io/detector/AwsConfig.java. The universally accepted film frame rate is 24 fps(Frame Per Second). Some standards support 25 fps and some high definition cameras can record at 30, 50 or 60 fps. FramesAnalysis=24 means, each 24th frame will be captured and sent to AWS for analysis.
  • onPreviewFrame() is a callback function that gets called for every frame once you start the camera to capture videos and works with OpenCV. AsyncAWSImageDetection class does the job to send the captured frames to AWS Rekognition to detect the labels/ tag.
@Override
public void onPreviewFrame(byte[] raw, Camera cam) {
Camera.Parameters parameters = cam.getParameters();
int width = parameters.getPreviewSize().width;
int height = parameters.getPreviewSize().height;
YuvImage yuv = new YuvImage(raw, parameters.getPreviewFormat(), width, height, null);ByteArrayOutputStream out = new ByteArrayOutputStream();
yuv.compressToJpeg(new Rect(0, 0, width, height), 50, out);
byte[] bytes = out.toByteArray();
AsyncAWSImageDetection runner = new AsyncAWSImageDetection();
runner.execute(bytes);}
  • We call AWS Rekognition service to process each frame asynchronously in the background.
class AsyncAWSImageDetection{...@Override
protected String doInBackground(byte[]... params) {
try{
public AmazonRekognition rekognitionClient = new AmazonRekognitionClient(new BasicAWSCredentials(AwsConfig.accessKey, AwsConfig.secretKey));
ByteBuffer sourceImageBytes = ByteBuffer.wrap(params[0]);
Image source = new Image().withBytes(sourceImageBytes);
DetectLabelsRequest request = new DetectLabelsRequest()
.withImage(source)
.withMaxLabels(10).withMinConfidence(75F);
DetectLabelsResult detectLabelsResult = rekognitionClient.detectLabels(request);
List<Label> detected=detectLabelsResult.getLabels();
}catch(Exception e) {
e.printStackTrace();
}
...}
}

Run the Application

You can run the application either on a real device or through an emulator. Refer to this link

Please see the working app demo here.

Thank you Sridhar Babu Kolapalli Sir for helping me to do this PoC.

Thank you for reading and keep following Nirman-Tech to see more such exciting posts.

--

--