Automate videos transcription with Koyeb Serverless Engine

Edouard Bonlieu
Koyeb

--

Introduction

This guide showcases how to deploy a video transcription service. We will use the Google Cloud Video Intelligence API to transform videos to text (speed-to-text) and the Koyeb Serverless Engine to handle your video files and orchestrate the processing.

Once you have completed this tutorial, you will be able to upload your videos via the Koyeb S3-compatible API which will trigger a function generating a video speech transcription file each time a new video is uploaded.

You can then use the speech transcription file to:

  • Index the results in a database and make your videos searchable
  • Automatically generate subtitles for your videos
  • Moderate videos based on their content

And many other use-cases.

In this guide, we use a Koyeb Managed Store to store our videos and the generated video speech transcription file. You can also connect your own Cloud Storage Provider to integrate with your existing infrastructure and data with minimal effort. Learn how to connect your Cloud Storage Provider here.

Requirements

To successfully follow and implement this tutorial, you need:

Steps

To build a video transcription service using Koyeb and the GCP Video Intelligence API there are four steps:

  1. Create a Koyeb Store to Upload videos & Retrieve the video speech transcription file
  2. Create a Koyeb Secret to store your Google Cloud Service configuration
  3. Create a Stack and deploy the video transcription function
  4. Upload a video and retrieve the video transcription file

Create a Koyeb Store to Upload videos & Retrieve the video speech transcription file

The first step is to create a Koyeb Store to store our videos and the transcription files generated. Koyeb Stores provides an S3-compatible API allowing you to manage your data programmatically using any S3-compatible SDKs and tools.

To create a new Koyeb Managed Store with the CLI, in your terminal, type:

koyeb create store -f new-store.yaml

Where the content of new-store.yaml is:

name: my-store-01
type: koyeb

From there, you have a Koyeb Store up and running. You can interact with your store using any S3 compatible SDKs and tools.

Configure S3cmd

The next step is to configure S3cmd to interact with the Store, so we can upload videos and retrieve video audio transcription files from there.

In the Koyeb Control Panel, click API in the left side menu and click New in the S3 credentials section. A modal appears to create a new S3 credential. Enter the name and a description (optional) to identify and remember what this credential is used for.

Click the Submit button. Save the access_key and secret_key generated in a secure place. Once the modal closed, you will not able to see them again.

Create an S3cmd in your home repository and replace the value REPLACE_ME with the credentials you previously generated.

1[default]
2access_key = REPLACE_ME
3secret_key = REPLACE_ME
4bucket_location = US
5check_ssl_certificate = True
6check_ssl_hostname = True
7default_mime_type = binary/octet-stream
8delay_updates = False
9delete_after = False
10delete_after_fetch = False
11delete_removed = False
12dry_run = False
13enable_multipart = True
14encoding = UTF-8
15encrypt = False
16follow_symlinks = False
17force = False
18get_continue = False
19guess_mime_type = True
20host_base = s3.eu-west-1.prod.koyeb.com
21host_bucket = %(bucket)s.s3.eu-west-1.prod.koyeb.com
22human_readable_sizes = False
23invalidate_default_index_on_cf = False
24invalidate_default_index_root_on_cf = True
25invalidate_on_cf = False
26limit = -1
27limitrate = 0
28list_md5 = False
29long_listing = False
30max_delete = -1
31multipart_chunk_size_mb = 15
32multipart_max_chunks = 10000
33preserve_attrs = True
34progress_meter = True
35put_continue = False
36recursive = False
37recv_chunk = 65536
38reduced_redundancy = False
39requester_pays = False
40restore_days = 1
41restore_priority = Standard
42send_chunk = 65536
43server_side_encryption = False
44signature_v2 = False
45signurl_use_https = False
46skip_existing = False
47socket_timeout = 300
48stats = False
49stop_on_error = False
50throttle_max = 100
51urlencoding_mode = normal
52use_https = True
53use_mime_magic = True
54verbosity = WARNING

To check the configuration is working fine, in the terminal type:

1s3cmd -c ~/.s3cfg-gcp  ls
22020-10-28 09:15 s3://my-store-01

You should see the Store you previously created.

Create a Koyeb Secret to store your Google Cloud Service Account configuration

Create a Koyeb Secret to securely store your GCP Service Account configuration. Koyeb Secrets allow you to access API credentials, tokens, etc. securely in your configuration and functions without having to expose them.

Create a secret.yaml file and replace the value with your GCP Service Account configuration.

name: gcp-sa-vi
value: |
{...}
koyeb create secrets -f secret.yaml

Create a Stack and deploy the video transcription function

Our Store is configured and ready-to-use. The next step is to deploy our processing function to perform the video speech transcription. We will use the Koyeb Catalog App to perform the processing as it allows you to perform this operation without writing a single line of code.

In the terminal, start by creating a new Stack. Stacks are processing environments containing code and containers.

koyeb create stack -n video-transcription

With our Stack created, we can configure and deploy the video speech transcription app. Create a file containing our function configuration video-transcription.yaml:

functions:
- name: gcp-video-intelligence
use: gcp-video-intelligence@1.0.1
with:
STORE: my-store-01 #The store to watch to trigger the function and save the GCP Video intelligence result. This parameter is required.
GCP_KEY: my-gcp-secret #The name of the secret in which the GCP service account will be stored. This parameter is required.
VIDEO_INTELLIGENCE_FEATURE: SPEECH_TRANSCRIPTION

Deploy the function by running:

koyeb create revision video-transcription -f video-transcription.yaml

This deploys the function into our Stack. Now, each time a video is uploaded to the Store my-store-01, the function will be triggered and a video speech transcription file will be generated.

Upload a video and retrieve the video transcription file

With our processing stack ready, we can now check everything is running fine and that for each video uploaded, a video speech transcription file is generated.

To upload a video using S3cmd, in the terminal type:

s3cmd put /path/to/video.mp4 s3://my-store-01

Now, if you type koyeb logs stack-events video-transcription you see an event appears that triggers your functions. This event is then used in your function to retrieve the video file and perform the speech transcription. You can follow the function execution running: koyeb logs functions video-transcription gcp-video-intelligence.

Once the execution done, you can retrieve the speech-transcription file running:

s3cmd get s3://my-store-01/gcp-video-intelligence-SPEECH_TRANSCRIPTION-[...].json

This file contains the result of the processing function with the detected text in the video:

1"results": [
2 {
3 "alternatives": [
4 {
5 "transcript": "Hey, I'm John...",
6 "confidence": 0.7477226853370667,
7 "words": [
8 {
9 "startTime": {
10 "nanos": 500000000
11 },
12 "endTime": {
13 "nanos": 700000000
14 },
15 "word": "Hey,"
16 },
17 {
18 "startTime": {
19 "nanos": 700000000
20 },
21 "endTime": {
22 "nanos": 900000000
23 },
24 "word": "I'm"
25 },
26 ...
27

Conclusion

In this guide, we discovered how to deploy a video transcription service using Google Video Intelligence API and the Koyeb Serverless Engine. We used S3cmd to upload and retrieve video but you can also use any S3 compatible SDKs and tools.

The catalog integrations code used in this guide is available on GitHub.

If you have any questions about this tutorial, feel free to reach out to us on the Koyeb Slack Community.

--

--