Automate your VOD transcoding at scale with GCP: Part 1

nazir kabani
Google Cloud - Community
11 min readAug 21, 2022

Introduction:

Google Cloud released a set of video focused APIs that enables developers to easily build and deploy flexible high quality video streaming experiences, beginning with transcoding.

All of these APIs combine Google’s media expertise with Google scale and reliability.

On September 30, 2021 we announced GA release of our Transcoder APIs and Its endpoints (https://transcoder.googleapis.com/v1/) focusing on 2 different problems which current customers are facing building complex VOD pipelines.

  1. Build your own: The Transcoder API is flexible enough for media companies to build the product they want.
  2. Managed Infrastructure: Google offers a scalable API so that media companies don’t have to manage infra

The Transcoder API is an easy-to-use API for creating consumer streaming formats, including MPEG-4 (MP4), Dynamic Adaptive Streaming over HTTP (DASH, also known as MPEG-DASH), and HTTP Live Streaming (HLS). Many direct-to-consumer streaming platforms use a multi-codec strategy. Google has expertise with direct-to-consumer video, given YouTube and Google Play, and has chosen this specific path/formats for media solutions.

Problem Statement:

Most OTT customers are struggling with automating their transcoding pipeline, and with complex workflows it becomes more complicated to automate it securely, without human intervention and also generate statistics.

With this two part blog I am covering end to end integration, automation, security, logging and reporting.

At the end of this blog series your architecture will look something like my architecture shared below.

Image: transcoding automation overall architecture

Part 1: Building automated VOD transcoding pipeline

Let’s start building automated VOD pipeline by creating two GCS (Google cloud storage) buckets and automate transcoding using cloud functions (below part of overall architecture).

Prerequisite:

Before we start with Cloud Functions, we will require two storage buckets (1 for source file and one for transcoded file) and a Transcoder API job template.

Storage Buckets

Step 1: Create one GCS bucket for source where producer or operator will be uploading mezzanine video files (source videos), in my example below I have created a transcoding-automation-source bucket in asia-south1 (Mumbai) region with the following configuration.

  • Standard default storage class
  • Enforce public access prevention on this bucket to true
  • Uniform access level
  • Google-managed encryption key for data encryption

Step 2: Similarly, create another GCS bucket for output, in my example below I have created a transcoding-automation-output bucket in asia-south1 (Mumbai) region with the same configuration.

Storage supports only unique bucket names, so you will have to use different bucket names, keep this in mind as you will have to change it everywhere in code below.

Transcoder API Job Template

Before automating transcoding process, it’s important to define transcoder job configuration like bitrate ladder, h264 or h265 codec settings, mux streams, segment settings, manifest settings, Pub/Sub notifications (it will be useful when we will build logging and reporting in part 2 of this blog), etc.. and put it in a json template.

Transcoder job consists of 3 main functions as shown below, transcoding job template also follows this in a json structure.

  1. Encoding
  2. Multiplexing
  3. Manifest creation

There are sample job templates and the process of creating job templates are available on https://cloud.google.com/transcoder/docs/how-to/job-templates, you can either leverage them or create your own template using transcoder job config.

Transcoder job config reference is available on google cloud reference documents (https://cloud.google.com/transcoder/docs/reference/rest/v1/JobConfig), this is subject in itself and since focus of this blog is to build automation I am not covering it in this blog (probably will write one more blog covering it).

Also, I built a few job templates and made them available on GitHub for readers of this blog to leverage and build this automation or use to build own templates.

Template 1: 4K-hevc, HLS and Dash output (https://github.com/nazir-kabani/transcoder-API-templates/blob/main/4k-hevc-hls-dash.json)

Template 2: hd-h264, HLS and Dash output (We will be using this template in this blog) (https://github.com/nazir-kabani/transcoder-API-templates/blob/main/hd-h264-hls-dash.json)

Template 3: hd-h264, mp4 output (https://github.com/nazir-kabani/transcoder-API-templates/blob/main/hd-h264-mp4.json)

Now, it’s time to create job template using gcloud command and Template 2

Step 1: Enable pub/sub API and create a pub/sub topic with the topic ID transcoding-job-notification (you can choose your own name but remember to replace your topic name with this name in the later part of the blog) — for ease of understanding, this is cloud pub/sub 2 from overall architecture.

Step 2: Download template 2 from GitHub and update line number 478 with project id and pub/sub topic id and save in your local computer.

For e.g., projects/<projectId>/topics/transcoding-job-notification

Will change to projects/test-project/topics/transcoding-job-notification

Step 3: open cloud-shell and create a folder to upload job template json file to the folder using below commands

  1. mkdir transcoding-automation
  2. Upload file

3. mv hd-h264-hls-dash.json transcoding-automation/

Step 4: Enable transcoder API, create service account and grant roles mentioned in the attached link (https://cloud.google.com/transcoder/docs/transcode-video#before-you-begin)

Don’t create json keys, we won’t require it for this exercise.

Step 5: Create 1st template using cloud-shell and below commands.

  1. cd transcoding-automation/
  2. gcloud transcoder templates create hd-h264-hls-dash — file=”hd-h264-hls-dash.json” — location=asia-south1

You can choose your preferred location from location near to your geo location, as of today transcoder api is available in 12 geographical regions (https://cloud.google.com/transcoder/docs/locations).

Bingo, you have created your 1st transcoding template. If you don’t want to read further and are not interested in automating your workflow then you can use the below cloud-shell command to start video transcoding.

gcloud transcoder jobs create \

— input-uri=”gs://transcoding-automation-source/file-name-with-extension” \

— location=asia-south-1 \

— output-uri=”gs://transcoding-automation-output/filename-without-extension/” \

— template-id=”hd-h264-hls-dash”

For details, please refer https://cloud.google.com/transcoder/docs/how-to/jobs#create_jobs_templates

Building automation using cloud functions:

Cloud functions run application code in response to events from google cloud products, firebase and google assistant, or call it directly from any web, mobile or backend application via HTTP.

In our case we will use object finalized triggers from GCS to trigger cloud functions. (https://cloud.google.com/functions/docs/calling/storage)

Creating cloud Pub/Sub 1

Before creating cloud functions 1, we will require to create a pub/sub topic which we will use in cloud functions 1’s code.

From google cloud console, go to pub/sub -> schemas and create schema.

Give any preferred name to your schema, for this blog I am using transcoding-start-status as a schema name.

Select Avro as a schema type and past json provided in GitHub gist in the schema definition and click on create.

https://gist.github.com/nazir-kabani/3380de756b892cc414ae7fb348ebf240

Now, click on create Topic from schemas page (don’t go to topic page and create create). Please enter Topic ID as transcoding-start-status and click on create topic.

Creating cloud Functions 1

Step 1: Enable apis required to run cloud functions by following steps mentioned in https://cloud.google.com/functions/docs/console-quickstart#before-you-begin

Step 2: From google cloud console, open cloud functions and click on create function

Step 3: Enter following details

  • Environment: 1st Gen
  • Function name: transcoding-start
  • Region: asia-south1 (you can choose region nearest to your geographical location similar to your source / output bucket and transcoder api regions)
  • Trigger type: Cloud Storage
  • Event type: On (finalizing/creating) file in the selected bucket
  • Bucket: (browse and select) transcoding-automation source

Then Click on save and

  • Expand Runtime, build, connections and security settings
  • Change memory allocated to 1 GB
  • Click on connections and select Allow internal traffic and traffic from Cloud Load Balancing
  • Click next
  • Next select python 3.10 runtime
  • Download transcoder api automation code from Github url https://github.com/nazir-kabani/transcoder-api-automation
  • Change the code in requirements.txt and main.py (both codes are available on GitHub)

Following things to remember when using GitHub code

  • Change bucket names with your bucket names in line number 42 and 43 of main.py
  • Change template id if your template ids are different in line number 44 of main.py
  • Change project id to your project id in line number 48 of main.py
  • Change location to your preferred location (where you want to run your transcoding jobs, your job template should be in the same location) in line number 48 of main.py
  • Change Pub/Sub topic path with pub/sub topic 1 you created above in line number 67 of main.py

Once these changes are done, click on deploy. Deployment will take a few minutes before it will be successful.

Step 4: Since cloud functions use app engine default service account, we will have to provide transcoder admin, storage admin and Pub/Sub Publisher roles to app engine default service account

  • Go to IAM & Admin -> Service Account and search for app engine default service account
  • Copy email of app engine default service account
  • Go to IAM
  • Click on add
  • In new principal add email of app engine default service account
  • In role add transcoder admin
  • Click on add another role
  • In role add storage admin
  • Click on add another role
  • In role add Pub/Sub Publisher
  • Click on save

Integration with Media CDN:

Next we will work on integration between the output bucket and Media CDN (below part from overall architecture).

Media CDN is Google Cloud’s media delivery solution. Media CDN complements Cloud CDN, which is Google Cloud’s web acceleration solution. Media CDN is optimized for high-throughput egress workloads, such as streaming video and large file downloads. Media CDN was GA released on November 9th 2022, if you are not seeing media CDN in your project then please reachout to the account team to allowlist your project for Media CDN.

For those who doesn’t have access to media CDN, please follow cloud CDN documentation available at https://cloud.google.com/cdn/docs/setting-up-cdn-with-bucket

Step 1: Once media CDN is allowlisted in your project, you will have to run below gcloud commands to enable to network services, certificate manager and edgecache APIs

gcloud services enable networkservices.googleapis.com

gcloud services enable certificatemanager.googleapis.com

gcloud services enable edgecache.googleapis.com

Step 2: On google cloud console -> Network Services, Open Media CDN, select origin tab and click on create origin. Give your preferred name to your origin, select the GSC bucket by browsing through UI and click on create origin.

In my configuration I choose vod-origin as my origin name.

Step 3: From Media CDN -> click on services -> create service

  • Under service name give your preferred name and description, in my example I choose vod-service as service name for my configuration
  • Click next
  • In routing, click on add host rule
  • You can add your host name here, for this exercise I am using wildcard * for my configuration
  • Click on add route rule
  • Give priority 1 and description “default rule”
  • Click on add match condition
  • Select match type prefix match and write wildcard / in path match box
  • Click done
  • Select primary action Fetch from an Origin
  • Select vod-origin from drop-down
  • Expand add-on actions drop down
  • Under route action click on add an item
  • Select CDN policy in type
  • FORCE_CACHE_ALL in cache mode
  • Enter 31536000 in default TTL
  • Under cache key policy select exclude query string and exclude host
  • Click on done and save route rule
  • Click done again while in the host rule

Click next and click on create service

Step 4: From the services tab, note down IPv4 address (masked in below screenshot), we will need it in the next step.

Step 5: Give media CDN service account permission to view storage objects

Go to IAM & Admin -> IAM and click on add and add media CDN service account in new principals

Example service account email : service-{PROJECT_NUM}@gcp-sa-mediaedgefill.iam.gserviceaccount.com

add storage object viewer role and click on save.

Now it’s time to test End to End:

In this step we will ingest Big Buck Bunny 4K video (which will be transcoded using transcoder API automation, you don’t need to do anything after ingest) and play using media CDN.

Step 1: Open Cloud shell and download Big Buck Bunny 4K video in cloud-shell

Step 2: visit cloud logging and filter cloud functions logs, you will find log line containing job ID.

Copy job id from cloud logging

Open cloud console and run

gcloud transcoder jobs describe {job-id} — location={location}

to get job status, please wait for job state to change to SUCCEEDED (keep trying above gcloud command at 2 minutes interval), once done check playback using any hls player, vlc player or chrome/firefox hls / dash extension.

you can check playback of transcoding output using CDN url.

http://{cdn-ipv4-ip-address}/bbb_sunflower_2160p_60fps_normal/manifest.m3u8

http://{cdn-ipv4-ip-address}/bbb_sunflower_2160p_60fps_normal/manifest.mpd

For video files other than Big buck bunny, you can use following url schema for playback

http://{cdn-ipv4-ip-address}/{file-name}/manifest.m3u8

http://{cdn-ipv4-ip-address}/{file-name}/manifest.mpd

It’s wrap

Voila, we are done with the automation, you can upload as many videos as you want using gsutil or cloud console (storage upload button), keep in mind transcoder API has quota for parallel jobs, so if you ingest more parallel videos than your quota limit then you may experience failures. In my case quota limit is 20, so I can only ingest 20 parallel videos. Reach-out to google support team if you wish to increase your quota limit.

In the next blog I am going to walk you through reporting and dashboard creation for this workflow.

If you are with me on this till now and want to build reporting and visualisation then please visit part 2 of my blog to continue with this story : https://medium.com/google-cloud/automate-your-vod-transcoding-at-scale-with-gcp-part-2-b1da0e57823d

Disclaimer: This is to inform readers that the views, thoughts, and opinions expressed in the text belong solely to the author, and not necessarily to the author’s employer, organisation, committee or other group or individual.

--

--

nazir kabani
Google Cloud - Community

Customer Engineer at Google Cloud Platform focusing on success of Media and Entertainment Customers