Step by Step Tutorial on how to use Google’s Video Intelligence API in Python

Harinath Selvaraj
coding&stuff
Published in
5 min readJan 20, 2019

Detecting an object from an input video plays a key role in Military, Security and Surveillance areas. This is particularly challenging due to the pose variations, clothing, background clutter, illumination and appearance.

Hundreds and thousands of people are employed to monitor the security video footage and identify whether an object is present in a video or not.

Humans are normally employed in Jobs to monitor the content in the video

This must be a tiring and time-consuming job. What if someone tells us when a car or a human was present in a video feed along with the time? Well, it is possible now due to the advancements of research in Deep learning and Computer vision technology.

Deep learning has outperformed various state of the art machine learning methods in the last few years. Computer vision is one of the main areas where it has shown remarkable improvements.

Google has made extensive research in this area and developed a system (a deep learning model) which can provide the object names in the video frame. It took billions of images and video feeds for Google to train the algorithm. The interesting part is — everything is available to us in the form of API.

Below is the link to access Google’s Cloud Video Intelligence.

The API supports common video formats, including .MOV, .MPEG4, .MP4, and .AVI.

What can you do with Google’s Video Intelligence API?

The below tasks (done by humans so far) can be easily done by a single API call 👏 👏

  • Label Detection: Detect objects, such as dog, flower, human, in the video.
  • Explicit Content Detection: Detect adult content within a video.
  • Shot Change Detection: Detect scene changes within the video.
  • Regionalization: Specify a region where processing will take place.
  • Speech Transcription: Transcribe speech in videos to text.
  • Object Tracking (Beta): Track objects in a video and report their locations (bounding boxes).
  • Text detection (Beta): Perform Optical Character Recognition (OCR) to detect and extract text in a video.

Now that we know what the API can do, let’s dive into the implementation part. Since many deep learning engineers use Python as their primary language, I will show how to use it in Python although the API provides support for other languages as well.

Step 1 — Setup Google Cloud account & Enable the API

Open Google Cloud website on your machine.

Note: For people who have been already using Google Cloud — If you are a developer who uses Google API like Maps, you might be already familiar with it.

Happy News — Google provides €300 free credit for people who are using it for the first time! 😃

Go to the Console and create a new project. Make sure you’ve setup billing in your account. You will be required to enter your credit card information. Don’t worry, they won’t charge you automatically after €300 expires. 🙂

Next step is to enable the API. Once it's done, you will also need to provide access to the enable API. In order to do that, create a new service account. Do not select any Role from the list. Choose ‘Create without role’ after submitting the form. This step will create a set of public and private keys (used for accessing the API) as a JSON file and will be downloaded to your machine. Keep this file safe as you’ll require in Step 3 of this article.

Now that you’ve done all the above step, the final setup process is to download the Google Cloud SDK

After downloading, go to the directory where the file is located, extract it and run the below commands from the terminal/command line to install and initialize the SDK. A new browser tab will be opened which will ask you to login to Google Cloud.

google-cloud-sdk/install.sh
google-cloud-sdk/bin/gcloud init

Step 2 — Do the Python Code

It's time for a little coding exercise! 😃

I’ll give you the python commands which can be run to use the Video Intelligence API.

Note: If you have not used Python before, please read the below article to install Anaconda (which in turn installs Python).

Before starting to code, Install the python package by running the below command from the terminal,

pip install google-cloud-videointelligence

We’re finally here!. Below are the python Commands for invoking the API —

#Import libraries
import argparse
from google.cloud import videointelligence
#Load the full path of JSON file obtained in step 1. Replace '/Users/harry/Downloads/SampleProject-1abc.json' with your filepath
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Users/harry/Downloads/SampleProject-1abc.json"
#Call the API
client = videointelligence.VideoIntelligenceServiceClient()
job = client.annotate_video(
input_uri='gs://cloud-ml-sandbox/video/chicago.mp4',
features=['LABEL_DETECTION'],
)
result = job.result()
print(result)

The output will look something like this,

annotation_results {
input_uri: "/cloud-ml-sandbox/video/chicago.mp4"
segment_label_annotations {
entity {
entity_id: "/m/01l7t2"
description: "downtown"
language_code: "en-US"
}
category_entities {
entity_id: "/m/01n32"
description: "city"
language_code: "en-US"
}
segments {
segment {
start_time_offset {
}
end_time_offset {
seconds: 38
nanos: 757872000
}
}
confidence: 0.9062400460243225
}
}
segment_label_annotations {
entity {
entity_id: "/m/06gfj"
description: "road"
language_code: "en-US"
}
segments {
segment {
start_time_offset {
}
end_time_offset {
seconds: 38
nanos: 757872000
}
}
confidence: 0.8779934048652649
}
}
.....
}

The above API result contains the description field (which describes the object) and the time of occurrence in the video and its confidence. The confidence of 0.9062400460243225 indicates that the system is 90% confident about its accuracy.

You can directly load this JSON output data into your database. This will enable you to query your database and check if a particular object was present during a timeframe.

Now, you might have noticed ‘LABEL_DETECTION’ given as a feature. Did I not tell you that Video Intelligence API can do much more things? It’s true, you can also extract many other features. Sample code is given below,

features=[‘LABEL_DETECTION’, ‘SHOT_CHANGE_DETECTION’]

Below are the various parameters that can be given as feature inputs,

LABEL_DETECTIONLabel detection. Detect objects, such as a dog or flower.

SHOT_CHANGE_DETECTIONShot change detection.

EXPLICIT_CONTENT_DETECTIONExplicit content detection.

SPEECH_TRANSCRIPTIONSpeech transcription.

Note — In case you are only using the Google Cloud project for testing purpose and you no longer wish to use it, please ensure that you delete your project after that or else Google will charge for using their resources!

That’s it, folks!. I hope you’ve learnt how to implement Video Intelligence API in Python code. Please comment below if you’re stuck with any issues and give me a clap if you liked my article! 🙂

--

--