Concept of Online Distributed Non-linear Editing System

This minimal viable product describe the basic concept of the online distributed rendering for non-linear editing system. It is lightweight, collaborative, extendable, faster than any other video editor currently.

Non-linear editing system has already developed for over 40 years and it is still the most efficient way in film editing. But the rendering speed of NLE depends on single machine’s calculate ability. Most film makers will buy expensive devices to build a workshop for the render cluster in order to test if their idea is reasonable. The faster your idea turns into pictures the more ways you could try. This is way a good thing build in.

With the inspiration of Apple’s Final Cut Pro X, we developed the incremental rendering algorithm. The renderer will only affect the minimal part of the modification without re-render the entire movie. We divided the original project file into cakes. The cake will be sent to workers via bake system. It will also filter out the rendered section.

The online video editor has these parts:

  1. A web interface to take user inputs and construct the project file.
  2. bake render system includes cake slicer, scheduler and workers to render the project into different level of outputs.
  3. A DASH (aka Dynamic Adaptive Streaming over HTTP) stream server to provide playback streaming.
  4. Thumbnail generator
  5. A smart video uploader transcode the media as same as uploading and generate clips in different quality levels.

Project File Format

We use Final Cut Pro X XML Format as the project format. It is open and already been used by a lot of software like Adobe Premiere Pro, DaVinci Resolve, etc. But we’ve changed few things to fit online storage. Final Cut has a single library file that keep everything inside. Events, projects and clips are all resources in the library. It’s good for local editing but obviously not that good to do that in clusters. The uploaded media will be stored in a cluster storage like GlusterFS and the project file only store references. Cloud file storage manage resources more simple and scalable.

We kept the parent-children model which is a brilliant part of FCP.

Bake Render System

Bake is a series of micro-services to render videos. It is master-slave pattern (or the Map-Reduce pattern) used for parallel processing.

Bake services share the same file storage in the cloud via GlusterFS. Workers and streamers can scale quickly and easily in the cluster. To learn more, please see the Gluster project home page.

Cake

A Cake is a group of one or more clips, the effects and filters for those clips and structure of clips combine together. The clips inside each cake has the same duration. Cake is the atom unit of the render piece.

A clip's offset is the shift of the original media to the start of the first clip.
A clip’s offset is the shift of the original media to the start of the first clip.

The md5 hash value of a cake calculated without start and duration property. So the hash value will be the same unless the relative time shift of the clips’ original video changed or clips effects changed. The cake’s render result saved in the key-value store with start, duration and hash together as the key. It’s important to keep the start and duration key outside hash value.

A cake sample looks like:

{
[md5hash]:
{ ranges: [ { start, end } ],
clips:
[ { offset, video: [Object] },
{ offset,
'adjust-crop': [Object],
video: [Object],
'filter-video': [Object] } ] },
}

Cake Slicer

like React virtual DOM, Bake slicer observe the project file changes and transform to cake sequence. Diff with previous rendered media and re-calculate the newly range of cake to render. If the cake has the same hash key and different ranges, subtract the rendered range from the cake and put the result in new one.

Scheduler

For each unscheduled cake, the scheduler tries to find a worker across the cluster according to a set of rules. There are two steps before a destination worker of a cake is chosen. The first step is filtering all the workers and the second is ranking the remaining workers to find a best fit for the cake.

The scheduler is not just an admission controller; for each cake that is created, it finds the “best” worker for that cake, and if no machine is suitable, the cake remains unscheduled until a worker becomes suitable.

We use kubernetes in this MVP to achieve the scheduler.

Worker

Worker take Cake as input and render it to video by designated codec. Then it will send it to cache server which combine the videos with same hash. If the new rendered video has duplicate range with caches, it will do trim and concat job to avoid the duplicate.

FFmpeg is used to bake the cake. It will demux and decode the related medias to frames and adjust the pictures, put some effects to them like the most image processing apps do and then encode frames to output videos. We have plan to rewrite these part because of the performance issue. FFmpeg doesn’t optimized for GPU computation and it has too much features that we don’t need. But to write an AVFoundation like framework costs money and time.

TBD: Frontend

The Web App is written in React, Redux and GraphQL. We’re considering to integrate Google Drive / Dropbox so we don’t need to provide paid spaces for customers which already paid for.

The reason why we write web first app is that Electron and React Native could help us to deploy cross-platform apps quickly without duplicate works on each platform.

Another reason is putting everything online enables us to collaborate together on the same project. It is a killing feature for online editing system. Once we’ve uploaded our videos, images and audios online, we can edit anywhere with friends or coworkers.

Dynamic Adaptive Streaming over HTTP

DASH aka Dynamic Adaptive Streaming over HTTP aka MPEG-DASH is an adaptive bitrate streaming technique that enables high quality streaming of media content over the Internet delivered from conventional HTTP web servers.

The content exists on the server in two parts: Media Presentation Description (MPD), which describes a manifest of the available content, its various alternatives, their URL addresses, and other characteristics; and segments, which contain the actual multimedia bitstreams in the form of chunks, in single or multiple files.

To play the content, the DASH client first obtains the MPD. By parsing the MPD, the DASH client learns about the program timing, media-content availability, media types, resolutions, minimum and maximum bandwidths, and the existence of various encoded alternatives of multimedia components, accessibility features and required digital rights management (DRM), media-component locations on the network, and other content characteristics. Using this information, the DASH client selects the appropriate encoded alternative and starts streaming the content by fetching the segments using HTTP GET requests.

Each project has a mpd file to store the final sequence for playback. The mpd file will update automatically when project file changed.

Smart Uploader

The smart uploader handle the file uploading and transcoding at the same time. It will split, transcode media into different quality levels while uploading. It is almost a real-time converter.

The original media won’t keep on the cloud. Each media will be transcoded into h264 codec video and AAC codec audio and will be to segment to fixed frames clips.

Uploader as two part:

  1. upload client analyze the meta data of the original media and handle the network transferring.
  2. upload server receives the streaming data and doing the segment and transcode jobs. The server is a master/slave pattern and has many workers to deal with transcode to achieve the real-time purpose.

The segment job will generate mpd file for each original media to contain the list of clips.

Thumbnail Generator

As the mpd file contains a list of transcoded media, thumbnail generator generate a sprite image for each clips. These clips are fixed frames so each picture in the sprite image present the key frame of these frames.

Youtube thumbnails example
Youtube thumbnails example

Extension

Extensions are not involved in the MVP.

It is a great idea for modern applications. We are building a lightweight editor like Google Chrome which focus on speed and render system and let things be done by community. Chrome Web Store, Atom / VSCode Extension Marketplace, Slack integrations, Github integrations, etc. Nowadays most of the SaaS platform has extension system to reach more complex requirements for different people. Benefit from the community’s wisdom, we could focus on the maintenance and low-level development.

Unlike Final Cut Pro X’s and Adobe Premiere Pro’s plugin system, the new editor allow us to write plugins in JavaScript which is a simple and powerful programming language for anyone. We will wrap the low level Graphic API to provide more precise control.

You can create templates, custom transitions, effects, filters via the extension system. Third-party developers could sell or share their extensions to others without self hosting. It’s an efficient way to get a similar effects in your movie for learning or producing.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.