Transcoding video at scale with Piper

Photo by Noom Peerapong on Unsplash

In my previous installment I introduced Piper — a general purpose, java-based workflow engine which I open sourced last year.

Today, I want to demonstrate a particular use-case, which Piper is particularly suitable for — video transcoding.

If you ever had to do any kind of video transcoding, you’ll know that it’s a very CPU demanding operation. Moreover, higher video quality requires longer processing time. Finally, if you run any sort of streaming service (a.k.a OTT) you generally would transcode your source file using multiple bitrate profiles to supports what is known as “adaptive streaming”, or the ability of your service to adapt to the bandwidth limitation of your client.

Depending on the size of your machine, the length of your input file and the make up of your profiles this will most likely take a while — many hours. Additionally, if something goes wrong somewhere along the way you are not going to want to restart the entire process from scratch but ideally resume from where things went awry.

This is where Piper shines. Not only does it support all the above requirements, but it can also split your videos into multiple chunks so as to parallelize the transcoding process on multiple machines.

Let’s take a quick look under the covers in order to understand how Piper works.

Architecture

Piper is composed of the following components:

Worker: Workers are the work horses of Piper. These are the Piper nodes that actually execute tasks requested to be done by the Coordinator machine. Unlike the Coordinator, the workers are stateless, which by that is meant that they do not interact with a database or keep any state in memory about the job or anything else. This makes it very easy to scale up and down the number of workers in the system without fear of losing application state.

Coordinator: The Coordinator is the central nervous system of Piper. It keeps tracks of jobs, dishes out work to be done by Worker machines, keeps track of failures, retries and other job-level details. Unlike Worker nodes, it does not execute actual work but delegate all task activities to Worker instances.

Message Broker: All communication between the Coordinator and the Worker nodes is done through a messaging broker.

This has many advantages:

  1. if all workers are busy the message broker will simply queue the message until they can handle it.
  2. when workers boot up they subscribe to the appropriate queues for the type of work they are intended to handle
  3. if a worker crashes the task will automatically get re-queued to be handle by another worker.
  4. Last, workers can be written in any language since they are completely decoupled from the Coordinator through message passing.

Job Repository: This piece holds all the jobs state in the system, what tasks completed, failed etc.

Pipeline Repository: This where pipelines (workflows) are created, edited etc.

Setting up Piper (Docker)

  1. Start the Message Broker (RabbitMQ):
docker run --name=rabbit -d -p 15672:15672 -p 5672:5672 creactiviti/rabbitmq:3.6.9-management

Verify that RabbitMQ started up properly:

Navigate to: http://localhost:15672/#/queues

This should look something like this:

2. Start the Coordinator

docker run --name=coordinator \
-d \
--link rabbit:rabbit \
-e piper.coordinator.enabled=true \
-e piper.pipeline-repository.git.enabled=true \
-e piper.pipeline-repository.git.url=https://github.com/creactiviti/piper-pipelines.git \
-e piper.pipeline-repository.git.search-paths=demo/,video/ \
-e piper.messenger.provider=amqp \
-e spring.rabbitmq.host=rabbit \
-p 8080:8080 \
creactiviti/piper

Verify that the Coordinator is working properly:

curl http://localhost:8080/health

Response:

{"status":"UP"}

3. Create a directory for the videos

mkdir videos
cd videos

4. Obtain a source input

wget "http://ftp.nluug.nl/pub/graphics/blender/demo/movies/ToS/tears_of_steel_720p.mov"

5. Start a Worker

docker run --name=worker-1 \
-d \
--link rabbit:rabbit \
-e piper.worker.enabled=true \
-e piper.worker.subscriptions.tasks=1 \
-e spring.rabbitmq.host=rabbit \
-e piper.messenger.provider=amqp \
-v $PWD:/videos \
-p 8181:8080 \
creactiviti/piper

Verify that the worker started properly:

curl http://localhost:8181/health

Response:

{"status":"UP"}

This completes all the setup necessary for the demo

Let’s transcode some video

To kick things off let’s perform a simple transcode:

curl -s \
-X POST \
-H Content-Type:application/json \
-d '{"pipelineId":"video/transcode","inputs":{"input":"/videos/tears_of_steel_720p.mov","output":"/videos/output.mp4","profile":"sd"}}' \
http://localhost:8080/jobs

Response:

{
"createTime": "2018-05-06T20:55:43.511+0000",
"webhooks": [],
"inputs": {
"output": "/videos/output.mp4",
"input": "/videos/tears_of_steel_720p.mov",
"profile": "sd"
},
"id": "473475bd36744702b9242ec4eb69a5ea", // the job ID
"label": "Transcode",
"priority": 0,
"pipelineId": "video/transcode:5da0f3c",
"status": "CREATED",
"tags": []
}

Using the Job ID from the response you got above, let’s check the status of the job:

curl http://localhost:8080/jobs/473475bd36744702b9242ec4eb69a5ea

Response:

{
"outputs": {},
"execution": [...],
"inputs": { ... },
"currentTask": 2,
"label": "Transcode",
"priority": 0,
"pipelineId": "video/transcode:5da0f3c",
"tags": [],
"createTime": "2018-05-06T20:55:43.511+0000",
"webhooks": [],
"startTime": "2018-05-06T20:55:43.516+0000",
"id": "473475bd36744702b9242ec4eb69a5ea",
"status": "STARTED"
}

If everything goes well, the Job Status (currently as STARTED) will eventually switch to COMPLETED at which point you should be able to play your output.mp4 file.

Split n’ Stitch

Now that we got “simple” transcoding to work let’s parallelize the process by using the Split n’ Stitch pipeline.

Start a second worker:

docker run --name=worker-2 \
-d \
--link rabbit:rabbit \
-e piper.worker.enabled=true \
-e piper.worker.subscriptions.tasks=1 \
-e spring.rabbitmq.host=rabbit \
-e piper.messenger.provider=amqp \
-v $PWD:/videos \
creactiviti/piper

Kick off the job:

curl -s \
-X POST \
-H Content-Type:application/json \
-d '{"pipelineId":"video/split_n_stitch","inputs":{"input":"/videos/tears_of_steel_720p.mov","output":"/videos/output.mp4","profile":"sd"}}' \
http://localhost:8080/jobs | jq .

Looking at the RabbitMQ console you should see the various tasks being consumed by the workers:

So now all you have to do in order to speed up the transcoding time is to add additional workers as we did above.

Summary

There’s a lot more to Piper that we didn’t cover. For more details and getting started tutorials feel free to checkout the project page.