Visual Brand Detection with Azure Video Indexer

Aaron (Ari) Bornstein
Microsoft Azure
Published in
8 min readMay 15, 2020

TLDR; This post will show how to use the Azure Video Indexer, Computer Vision API and Custom Vision Services to extract key frames and detect custom image tags in indexed videos.

All code for the tutorial can be found in the notebook below. This code can be extended to support almost any image classification or object detection task.

The tutorial requires an Azure subscription, however everything can be achieved using the free tier. If you are new to Azure you can get a free subscription here.

What is Azure Video Indexer?

Azure Video Indexer automatically extracts metadata — such as spoken words, written text, faces, speakers, celebrities, emotions, topics, brands, and scenes from video and audio files. Developers can then access the data within their application or infrastructure, make it more discover-able, and use it to create new over-the-top (OTT) experiences and monetization opportunities

Often, we wish to extract useful tags from videos content.These tags are often the differentiating factor for having successful engagement on social media services such as Instagram, Facebook, and YouTube

This tutorial will show how to use Azure Video Indexer, Computer Vision API, and Custom Vision service to extract key frames and custom tags. We will use these Azure services to detect custom brand logos in indexed videos.

This code can be extended to support almost any image classification or object detection task.

Step #1 Download A Sample Video with the pyTube API

The first step is to download a sample video to be indexed. We will be downloading an episode of Azure Mythbusters on Azure Machine Learning by my incredible Co-Worker Amy Boyd using the Open Source pyTube API!

Installation:

pyTube can be installed with pip

!pip install pytube3 --upgrade

Code:

from pytube import YouTube
from pathlib import Path
video2Index = YouTube('https://www.youtube.com/watch?v=ijtKxXiS4hE').streams[0].download()video_name = Path(video2Index).stem

Step #2 Create An Azure Video Indexer Instance

Navigate to https://www.videoindexer.ai/ and follow the instructions to create an Account

For the next steps, you will need your Video Indexer

  • Subscription Key
  • Location
  • Account Id

These can be found in the account settings page in the Video Indexer Website pictured above. For more information see the documentation below. Feel free to comment below if you get stuck.

Step #3 Use the Unofficial Video Indexer Python Client to Process our Video and Extract Key Frames

To interact with the Video Indexer API, we will use the unofficial Python client.

Installation:

pip install video-indexer

Code:

  • Initialize Client:
vi = VideoIndexer(vi_subscription_key='SUBSCRIPTION_KEY',
vi_location='LOCATION',
vi_account_id='ACCOUNT_ID')
  • Upload Video:
video_id = vi.upload_to_video_indexer(
input_filename = video2Index,
video_name=video_name, #must be unique
video_language='English')
  • Get Video Info
info = vi.get_video_info(video_id, video_language='English')
  • Extract Key Frame Ids
keyframes = []
for shot in info["videos"][0]["insights"]["shots"]:
for keyframe in shot["keyFrames"]:
keyframes.append(keyframe["instances"][0]['thumbnailId'])
  • Get Keyframe Thumbnails
for keyframe in keyframes:
img_str = vi.get_thumbnail_from_video_indexer(video_id,
keyframe)

Step #3 Use the Azure Computer Vision API to Extract Popular Brands from Key Frames

Out of the box, Azure Video Indexer uses optical character recognition and audio transcript generated from speech-to-text transcription to detect references to popular brands.

Now, that we have extracted the key frames we are going to leverage the Computer Vision API to extend this functionality to see if there are any known brands in the key frames.

  • First we will have to create a Computer Vision API key. There is a free tier that can be used for the demo that can be generated with the instructions in the documentation link below. Once done you should get a Computer Vision subscription key and endpoint

After we have our Azure Computer Vision subscription key and endpoint, we can then use the Client SDK to evaluate our video’s keyframes:

Installation:

pip install --upgrade azure-cognitiveservices-vision-computervision

Code:

  • Initialize Computer Vision Client
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))
  • Send Keyframe To Azure Computer Vision Service to Detect Brands
import timetimeout_interval, timeout_time = 5, 10.0
image_features = ["brands"]
for index, keyframe in enumerate(keyframes):if index % timeout_interval == 0:
print("Trying to prevent exceeding request limit waiting {} seconds".format(timeout_time))
time.sleep(timeout_time)
# Get KeyFrame Image Byte String From Video Indexer
img_str = vi.get_thumbnail_from_video_indexer(video_id, keyframe)
# Convert Byte Stream to Image Stream
img_stream = io.BytesIO(img_str)
# Analyze with Azure Computer Vision
cv_results = computervision_client.analyze_image_in_stream(img_stream, image_features)
print("Detecting brands in keyframe {}: ".format(keyframe))if len(cv_results.brands) == 0:
print("No brands detected.")
else:
for brand in cv_results.brands:
print("'{}' brand detected with confidence {:.1f}% at location {}, {}, {}, {}".format( brand.name, brand.confidence * 100, brand.rectangle.x, brand.rectangle.x + brand.rectangle.w, brand.rectangle.y, brand.rectangle.y + brand.rectangle.h))

Azure Computer Vision API — General Brand Detection

Step #4 Use the Azure Custom Vision Service to Extract Custom Logos from Keyframes

The Azure Computer Vision API, provides the ability to capture many of the worlds most popular brands, but sometimes a brand may be more obscure. In the last section, we will use the Custom Vision Service, to train a custom logo detector to detect the Azure Developer Relation Mascot Bit in in the keyframes extracted by Video Indexer.

My training set for Custom Bit Detector

This tutorial assumes you know how to train a Custom Vision Service object detection model for brand detection. If not check out the If not, check out the documentation below for a tutorial.

Instead of deploying to mobile, however we will use the python client API for the Azure Custom Vision Service. All the information you’ll need can be found in the settings menu of your Custom Vision project.

Settings menu for Custom Vision Service

Installation:

pip install azure-cognitiveservices-vision-customvision

Code:

  • Initialize Custom Vision Service Client
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClientprediction_threshold = .8
prediction_key = "Custom Vision Service Key"
custom_endpoint = "Custom Vision Service Endpoint"
project_id = "Custom Vision Service Model ProjectId"
published_name = "Custom Vision Service Model Iteration Name"
predictor = CustomVisionPredictionClient(prediction_key, endpoint=published_name)
  • Use Custom Vision Service Model to Predict Key Frames
import time
timeout_interval, timeout_time = 5, 10.0
for index, keyframe in enumerate(keyframes):
if index % timeout_interval == 0:
print("Trying to prevent exceeding request limit waiting {} seconds".format(timeout_time))
time.sleep(timeout_time)
# Get KeyFrame Image Byte String From Video Indexer
img_str = vi.get_thumbnail_from_video_indexer(video_id, keyframe)
# Convert Byte Stream to Image Stream
img_stream = io.BytesIO(img_str)
# Analyze with Azure Computer Vision
cv_results = predictor.detect_image(project_id, published_name, img_stream)
predictions = [pred for pred in cv_results.predictions if pred.probability > prediction_threshold]
print("Detecting brands in keyframe {}: ".format(keyframe))
if len(predictions) == 0:
print("No custom brands detected.")
else:
for brand in predictions:
print("'{}' brand detected with confidence {:.1f}% at location {}, {}, {}, {}".format( brand.tag_name, brand.probability * 100, brand.bounding_box.left, brand.bounding_box.top, brand.bounding_box.width, brand.bounding_box.height))

Conclusion

And there we have it! I am able to find all the frames that have either Microsoft for or the Cloud Advocacy Bit Logo in my video.

Sample Key Frames with Bit

Next Steps

You now have all you need to extend the Azure Video Indexer Service with your own custom computer vision models. Below is a list of additional resources to take that will help you take your integration with Video Indexer to the next level.

Offline Computer Vision

In a production system, you might see request throttling from a huge number of requests. In this case, the Azure Computer Vision service can be run in an offline container

Additionally, the Custom Vision model can be run locally as well.

Video Indexer + Zoom Media

Creating an Automated Video Processing Flow in Azure

About the Author

Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.

--

--

Aaron (Ari) Bornstein
Microsoft Azure

<Microsoft Open Source Engineer> I am an AI enthusiast with a passion for engaging with new technologies, history, and computational medicine.