Automate your VOD transcoding at scale with GCP: Part 2

Published in

Google Cloud - Community

8 min readAug 30, 2022

Welcome back to the 2nd part of my two part blog on automating your VOD transcoding at scale with GCP.

I started my 1st part with the promise of covering end to end integration, automation, security, logging and reporting illustrating the architecture diagram below.

Image: transcoding automation overall architecture

In part 1 I covered transcoding pipeline automation, below part from overall architecture. We also built cloud Pub/Sub 1 and cloud Pub/Sub 2 which were not useful in part 1 but we will use them here in part 2.

If you missed the 1st part of of this blog series then I recommend you to first read part 1 available on https://medium.com/google-cloud/automate-your-vod-transcoding-at-scale-with-gcp-part-1-87503fdd3a9f and then start with this blog.

Part 2:

In the last part I covered automation, logging and security (no human intervention and origin access using IAM permissions), in this part I am covering the rest of the integration, reporting and visualisation.

Transcoding Start Status:

We created a Pub/Sub topic (transcoding-start-status) in part 1, now it’s time to connect Pub/Sub with Google BigQuery (BQ), in this section we will be building the below part from overall architecture.

BigQuery (BQ) Transcoding Start Status:

Before we create a subscription to BQ, we will need to create a dataset and table. Go to BQ from google cloud console, click on three dots next to your project-id and click on create Dataset.

Create Dataset:

In the following example I chose transcoding_jobs as a Dataset ID and asia-south1 as a Data location (you can choose your own Dataset ID and Data location) and kept default configuration for the rest of the settings.

Create transcoding-start-status Table:

Similarly, click on three dots next to your dataset (transcoding_job), click on create table.

I used transcoding-start-status as a table name, on schema, click on edit as text and past json provided in GitHub gist as a text and click on create table

https://gist.github.com/nazir-kabani/bea532aa37a0c3ad86205dd83f0f397d

Pub/Sub 1 to BQ Export:

Before we create Pub/Sub to BQ Export, we will have to give BigQuery table permissions to the Pub/Sub service account.

From google cloud console, go to IAM click ADD and add Pub/Sub service account in New Principals (Your ID will look similar to service-{Project-Number}@gcp-sa-pubsub.iam.gserviceaccount.com), add BigQuery Data Editor in role and save.

Now, from google cloud console go to Pub/Sub -> Topics and open transcoding-start-status. Click on EXPORT TO BIGQUERY

You will see a pop-up to decide between Use Pub/Sub or Use Dataflow, Select Pub/Sub and Click continue.

On the next page, select subscription-id of your choice, select transcoding_jobs as a dataset, transcoding-start-status as a table, select use topic schema and select drop unknown fields and click on create.

That’s it. We are done with transcoding start status export to BQ, but we will check that after we will publish transcoding complete status export to BQ.

Transcoding Complete Status:

Easy so far? Now let’s build the following part of architecture from overall architecture to publish transcoding complete status export to BQ.

In part 1 of the blog we created a cloud Pub/Sub 2 topic name transcoding-job-notification and used this topic in the transcoder job template. Transcoder API uses this topic to publish transcoding job status after job completion, now we will use this topic to trigger cloud functions 2 to get job complete metadata from transcoder API and publish job complete metadata to cloud Pub/Sub 3.

Pub/Sub 3:

Before building 2nd cloud functions, we will have to create Pub/Sub3 and use its topic ID in Cloud Functions 2.

From google cloud console, go to the Pub/Sub -> Schemas and click on create schema.

Give any preferred name to your schema, for this blog I am using transcoding-complete-status as a schema name.

Select Avro as a schema type and past the json from below GitHub gist in schema definition and click on create.

https://gist.github.com/nazir-kabani/9b811d1955c0e7803de7550090269092

Now, click on create Topic from schemas page (don’t go to topic page to create). Enter Topic ID as transcoding-complete-status and click on create topic.

Cloud Functions 2:

Now, we will build 2nd cloud functions using the following configuration.

From google cloud console, go to cloud functions and click on create function, give your preferred name to function name (I am using transcoding-job-complete name in this blog), and select nearest region where transcoder API and your source and output buckets are available.

In trigger type, select Cloud Pub/Sub, select transcoding-job-notification in select a cloud Pub/Sub Topic and click on Save.

Expand on Runtime, build, connections and security settings and do the following changes and click on next.

Under runtime, change memory allocated to 1 GB
Under connections, select Allow internal traffic and traffic from Cloud Load Balancing

On the next screen, select Python 3.10 as a runtime environment and copy / paste codes from following GitHub gist urls. (before you paste code — please don’t forget to change the following lines in the code).

Update project id in line number 40
Update output bucket name in line number 44
Update CDN hostname or IPv4 address in line number 46 and 47
Update project id and Pub-Sub 3 topic name (if it’s different) in line number 62

main.py — https://gist.github.com/nazir-kabani/a436f2e9ca3ed1b1276d0ced14ec88b1

Requirements.txt — https://gist.github.com/nazir-kabani/d702849967098aba99cf63d5e7f0833f

Next, click on deploy.

BigQuery (BQ) Transcoding Complete Status:

We already created a transcoding_jobs dataset and transcoding-start-status table in previous steps. Now we will create a transcoding-complete-status table under the same dataset.

Create transcoding-complete-status Table:

Click on three dots next to your dataset (transcoding_job), click on create table.

I used transcoding-complete-status as a table name, on schema, click on edit as text and past json provided in GitHub gist as a text and click on create table.

https://gist.github.com/nazir-kabani/fa072136cf2f8698dca18313d0e0e49f

Pub/Sub 3 to BQ Export:

From google cloud console go to Pub/Sub -> Topics and open transcoding-complete-status. Click on EXPORT TO BIGQUERY

You will see a pop-up to decide between Use Pub/Sub or Use Dataflow, Select Pub/Sub and Click continue.

On the next page, select subscription-id of your choice, select transcoding_jobs as a dataset, transcoding-complete-status as a table, select use topic schema and select drop unknown fields and click on create.

Joining tables and building dashboards:

That’s it, now we are done with all integrations, it’s now time to create a join between both BQ tables and create a single pane of glass for transcoding jobs (below part of overall architecture).

Before starting with BQ join, I want you to upload a good amount of short videos (at least 4–5) to your source storage bucket, so once you join BQ tables it will show good amount of rows filled with data.

BQ table join:

From google cloud console, go to BQ, open any table and click on the query in the new tab.

Run the following query after changing project-name, dataset-name and table-name (only if you have changed from what I am using in the blog).

SELECT * from `project-name.transcoding_jobs.transcoding-complete-status` complete
left join `project-name.transcoding_jobs.transcoding-start-status` start
on start.jobId = complete.jobId

After you run this query, you will see query results in a BQ table below your query page (similar to the one below).

Visualising data using Data Studio:

Last step of this exercise is to visualize transcoding job’s data using Data Studio. From explore data on query results, click on explore with Data Studio and it will open another window with Data Studio report.