Scalable Video Transcoding With App Engine Flexible

Bill Prin
Google Cloud - Community
8 min readAug 8, 2016
App Engine Flexible Environment

It’s no secret I’m a big fan of Container Engine (Google’s managed Kubernetes) as a way to seamlessly run your containers across a cluster of VMs. However, it’s far from the only way to run containers on Google Cloud. One slightly obvious option is to start Google Compute Engine instances and just run Docker yourself. But one of the easiest ways to run arbitrary containers in a microservice environment is actually the new environment for App Engine, App Engine Flexible Environment.

In this post, I’ll discuss how the Flexible Environment can be used to run a popular library called ffmpeg (or rather, a fork called libav) to do common video operations like transcoding and resizing of videos. Furthermore, I’ll demonstrate how with zero configuration, this setup will autoscale to fulfill the demands the load puts on it.

Please note that App Engine Flexible Environment is still a beta product.

While App Engine has been known for years to power products like Snapchat to scale to millions of users with minimal backend engineering, it was also a product that forced a somewhat restrictive sandbox and development environment on its users. It was great if you could develop in this environment, but you could also get stuck if you needed something like a library that didn’t come with the sandbox, like ffmpeg.

While on the surface, App Engine Flexible Environment looks a lot like App Engine Standard, unlike Standard, it actually spins up Google Compute Engine instances and load-balancers to autoscale your app on traditional Infrastructure-as-a-Service, and runs Docker on those instances. That means you can pretty much run whatever you want!

App Engine Flexible Environment ships with several default runtimes, like NodeJS (popularly used to run the Parse server) and Ruby. The Docker-centric nature of Flexible Environment made it easy to create these runtimes and publish them on Github. However, one of the less-talked about features of the Flexible Environment is the ability to create your own custom Docker runtimes. This can be used to install things like ffmpeg.

Compared to Compute Engine and Container Engine

Before reviewing the demo code, let’s briefly compare options for video transcoding on Google Cloud:

  • Google Compute Engine — Compute Engine has one big advantage in this space — price. This is especially true if your video needs are offline, rather than real-time, because then you can save up to 70% and use preemptible VMs. These VMs can be killed at at any time, so they aren’t appropriate for use-cases where you have time demands on the processing. But for long-running background jobs, they are a great choice. The big disadvantage of Compute Engine? To scale it, you’ll have to create the Managed Instance Templates, Managed Instance Groups, Load Balancers, etc yourself. Setting this up correctly can be a bit of work, and it will take you a long time to reproduce all the configuration that App Engine Flexible comes with out of the box.
  • Google Container Engine — Like Compute Engine, Container Engine can require a lot of setup and configuration work. Unlike Compute Engine, getting Docker-based microservices talking to each other is a breeze. However, it still requires a lot of configuration and babysitting compared to Flexible Environment. Price-wise, it will be the same or slightly more than Compute Engine without pre-emptible VMs, depending on how many instances you are running. While it offers even more flexibility, it can be a bit heavyweight for simpler use cases. The best argument for using Container Engine is if you are already heavily using it.

The sweet spot for Flexible Environment is when you want to run a few Docker containers, and you are fitting into the very common use cases of having a few services that respond to web or RPC traffic and need load balancers.

Like Container Engine, Flexible Environment exposes the concept of services, except in this case, the services must always follow certain conventions. There must be a single default service that listens to web traffic on port 8080, and there can be multiple backend services. They must have health-checks or have health-checks explicitly disabled. And each service can only run a single image, which you define in a Dockerfile and which gets built by Cloud Container Builder.

Demo Code Review

You can find the sample code in my repo waprin/appengine-transcoder. Please note this is a demo, not a productionized app. A few notes:

  • While I said I used ffmpeg, I actually used a fork of ffmpeg called libav. While ffmpeg is still in active development and considered by many to be superior to libav, libav is included with the Debian package manager and thus simpler to install. If you prefer ffmpeg, you can find many examples of containers that build it from source.
  • The code is written in Python, but any language would have been fine as the actual transcoding is done by shelling out to libav. There are actually several Python libraries that would be more appropriate for video operations like moviepy that provide nice abstractions on top of ffmpeg, but I wanted to keep the example repo as simple as possible.
  • Our application needs two services: a default service to respond to web traffic to kick off the transcoding, and a background worker to actually do the work. The reason we can’t just do it all in one container is because the transcoding might take long enough that the web request would timeout, which would also kill the job.
  • There are a few ways to communicate between two services, but in this case, I simply publish a message to Cloud Pub/Sub. Pub/Sub is designed as a highly scalable and reliable message bus. Out of the box, it lacks task queue semantics, such as whether a task succeeded or failed and how long a task took. It wouldn’t be too much to use a task queue on top of Pub/Sub, as my colleague demonstrates with his repo psq, inspired by the rq project. If I were to productionize this app, I would switch to psq.

With that, let’s look at a quick architecture diagram for the app:

Service Architecture

As you can tell, the workflow goes through several steps:

  • We start with a web request. In the demo code’s case, it’s a slightly contrived example where a web request kicks off the transcoding of the same video over and over again from mp4 to webm.
  • Next, the service publishes a message to the Pub/Sub topic. This is one way to let the backend service know it needs to start the task, and will be quick enough that the default service can still respond with an HTTP OK before the timeout.
  • The worker service is listening on the Pub/Sub topic, and once it receives a message, it knows to start the process. In a more realistic case, the message might signify which job to do, but in this case any message just means transcode the same video.
  • The worker downloads the video to transcode from Google Cloud Storage and writes it to the temporary file system. While you shouldn’t rely too much on the file system in the Flexible Environment since your instances are transient, but using a temp file is fine. It shells out to libav, and then uploads the transcoded video to Google Cloud Storage.

Here is what the piece of worker code that does the download, transcoding, and uploading looks like:

bucket = client.bucket(‘appengine-transcoder’)
blob = bucket.blob(‘sample.mp4’)
with open(‘/tmp/sample2.mp4’, ‘w’) as f:
blob.download_to_file(f)
os.system(‘rm /tmp/output.webm’)
ret = os.system(‘/usr/bin/avconv -i /tmp/sample2.mp4 -c:v libvpx -crf 10 -b:v 1M -c:a libvorbis /tmp/output.webm’)
if ret:
sys.stderr.write(“FAILED”)
return “Failed”
blob = bucket.blob(‘output.webm’)
blob.upload_from_file(open(‘/tmp/output.webm’))

As you can see, we use the Python Cloud client libraries to talk to Google Cloud Storage and Pub/Sub, and just shell out to libab (avconf) to transcoder our temporary file before uploading the result back to Cloud Storage. Writing to stderr makes sure that we can see any failures.

How did we make sure libav was on the system? Well, in my Dockerfile, I just made sure to apt-get install it on top of my favorite Python image:

FROM gcr.io/google_appengine/python
RUN apt-get -y update && apt-get install -y libav-tools

Then in my worker.yaml, I declared that I was using a custom Dockerfile for this service:

service: worker # name of the service (necessary for non-default)
runtime: custom # (will use Dockerfile in same directory)
vm: true # signifies flexible environment
entrypoint: python /app/worker.py
env_variables:
PYTHONUNBUFFERED: 1 # make sure we see logs
vm_health_check:
enable_health_check: False # since no real health check

You’ll notice that in this case I just disabled health-checks, but in a real app, you’ll probably want to make a health check that verifies your transcoding is succeeding. Learn more about how health checks manage your instance lifecycle in the docs.

To deploy the worker service, I use the standard Google Cloud SDK

gcloud app deploy worker.yaml

Then I can go back to the top-level directory and deploy my main service that will simply accept web requests and kick off the task on Pub/Sub:

gcloud app deploy app.yaml

Once it succeeds, I can run `gcloud compute instances list` and see that I am currently running 2 instances for each service.

Now if I go to https://my-project-id.appspot.com/transcode to hit the endpoint that kicks off the transcode. I can use Stackdriver Logging to inspect any errors, and once it’s done I should see the webm version of my video in my Cloud Storage bucket.

Watching it Scale

Now, what about scaling? It’s already done! As I increase load on my services, more instances will automatically be created. If you’re concerned about cost, make sure to set `max_instances` in the app.yaml and verify your budgets in the billing section of the Cloud Console.

To demonstrate, I wrote a very minimal load test, that hits my endpoint over and over again and drives up CPU usage on the worker service:

import time
import requests
PROJECT_ID = 'your-project-id'URL = 'https://{}.appspot.com/transcode'.format(PROJECT_ID)
NUM_REQUESTS = 100
DELAY=.1
for i in range(0, NUM_REQUESTS):
requests.get(URL)
time.sleep(DELAY)

Right away, the CPU demand on my worker service caused App Engine to create more instances:

If you run this load test, make sure the app downscales afterwards. If you want to be on the safe side, delete the worker service altogether.

As always, if you’re interested in more technical content with a focus on Python and Google Cloud Platform products like Kubernetes, App Engine, and Bigquery, follow me on Twitter @waprin_io, and if you have any questions on issues, feel free to comment here on Medium, message me on Twitter, or mention me @waprin on Github.

--

--