How I ended up with a more controlled solution by eliminating Google Cloud Functions & using an extra layer of messaging queue for Video Analytics

Recently, I got to work on a project kind of a Video Library where admins should be able to upload videos and the end users should be able to search inside the transcriptions of the videos.

Although this article can be useful for you if you’re handling file transfers & file operations.

Conditions that I was supposed to keep in mind:

  1. App Engine standard environment doesn’t allow file uploading more than 32 MB in size to Cloud Storage directly.
    - Solution: Resumable upload using Signed URLs, upload from browser by Xhr requests — Chunk Uploading
  2. Video to Audio conversion and transcription should not happen in Cloud Functions because of TIMEOUT issue.
    - Solution: Compute Engine(The only thing you can always rely on!)
  3. An acknowledgement is necessary within 9 minutes from the subscriber for a published message. Video processing can take more time.
    -Solution: Acknowledge as soon as received and process afterwards.(It was problematic initially)
Architecture Diagram - v1

Roles:

  1. App Engine:
    1. Generate Signed URL by making request to Cloud Storage
  2. Cloud Function(Storage trigger event - google.storage.object.finalize):
    1. Load file data in Datastore
    2. Publish message(Pub/Sub) File data along with Datastore key-id & key-kind (key-id and key-kind are the best combination to fetch a result from datastore using Cloud Client Library or ndb and is much better than querying)
  3. Compute Engine:
    1. Download the Video from GCS
    2. Convert to Audio(FLAC is recommended) and upload it to GCS Bucket(longer audios can not be given as the source to Cloud Speech API)
    3. Load the Cloud Speech API response to BigQuey or any suitable database
    4. Log the processes and statuses

I’ve always loved loosely coupled applications but I was a little unhappy with this Architecture diagram. As long as I was losing control over data at the cloud function, an improvement was required. The cloud function will have event as the argument given by Cloud Storage and there, I was not able to provide any other parameters in Cloud Function definition.

The requirements are always meant to be added/modified. Same shot me,
While uploading a video, admins should be able to provide custom information which has to be stores in Datastore and BigQuery. The loss of control in Arguments and Parameters at Cloud Function enforced me to make changes in.

Necessary changes:

  1. Cloud Functions has to be eliminated by moving the steps done by it to App Engine.
    Unpleasant Impacts:
    When an admin uploads a video, 
    Limitation 1:
    App Engine would be generating a signed URL and publishing(Pub/Sub) a message for a file. While the uploading would be still ongoing, published message will be subscribed. and will start processing.
    Limitation 2:
    While processing, the acknowledgement can not be given at the end of processing(processing might take > 9 minutes). If acknowledgment is given for the message once message is received and process it, the next published message will be subscribed(Flow Control became point less). As VM is smaller in configuration, It might run after memory in case of multiple videos are uploaded at the same time.

Messaging Queue - the Savior

Using a messaging broker like RabbitMQ in the Compute Engine with a right flow chart helped to overcome the problems of eliminating Cloud Functions. As I had to choose cost factor over instant processing, I had to use less configuration VM. Hence, sequential processing was must instead of parallel.

Architecture Diagram - v2

Below code subscribes published messages and pushes into a RabbitMQ queue to be consumed as wish and acknowledges to the piblished quickly.

Following the below flow chart helped to avoid the mess like

  1. Subscriber receives the next message right after acknowledging the current message when Flow Control is set. The best practice is to acknowledge the message once operation is done over it. Pub/Sub messages should be queued and be processed outside of pub/sub context in operations those take more time(> 9 mins) to ensure the proper operational execution.
Flow Chart - v1 for Architecture Diagram - v2

Tips:

  1. Delivery tags of RabbitMQ messages can be used to attempt number of retries in the cases of failed operations as it starts from 1 for every new message and gets increased by one for every requeue=true.
  2. Delay is advised to keep before rejecting a message.