The case for client side encoding

Published in

Viewly

3 min readJul 27, 2017

Today everyone has a small supercomputer in their pocket. Some of those devices also ship with 4k cameras, and soon enough, real-time AR (augmented reality) capabilities.

YouTube has been conceived in a completely different era, when personal computers used to be relatively slow, and flip phones where the latest hotness. Encoding videos on a centralized server farm has been the most logical solution for a while.

In this post, we will explore the possibilities of moving the encoding to client devices, and the benefits it incurs over the cloud based solutions of today.

Modern browser and WebAssembly

Web Assembly is an emerging web standard for building high performance web applications. It is in the process of being adopted by all major browser vendors, and by doing so, will allow developers to port their high performance C and C++ applications to the browser.

Here is an example of a high fidelity 3D rendering in Firefox web browser, powered by WebAssembly and WebGL2.

It is not unreasonable to expect, that we could not only perform efficient encoding and transcoding from the client side browser app (using tools such as the highly optimized FFmpeg), but also develop a responsive video editor.

Having a free, and easy to use video editing software, right in a browser, would empower amateur creators, as well as serve as a highly convenient option for shorter clips.

Encoding on mobile

Mobile devices like iPhone have been capable of video rendering for a while (iMovie for iPhone has been introduced in 2010). CPU’s and GPU’s in mobile devices have been improving at a rapid rate. Here is an example of improvements to the graphics fidelity in games.

It is safe to assume that modern mobile devices are perfectly capable of rendering and transcoding videos. Furthermore, modern frameworks such as Apple’s Metal could enable us to leverage the parallel compute capabilities of mobile GPU’s.

Reducing bandwidth costs and upload times

While the computing capabilities have been improving exponentially, the global internet speeds and available bandwidth have not. To make matters worse, many monopolistic telco providers impose outrageously small monthly rations of bandwidth to their clients.

We can decrease the bandwidth costs, as well as the amount of time it takes for videos to upload, by encoding videos on the client devices.

H.264

A 1 minute 4k video in its original format, as shot by the iPhone 7, takes 357 MB of space. With MPEG-4 avc1 encoding, the file size is reduced to 177 MB. Transcoding into multiple resolutions, the aggregate size is 245 MB, a bandwidth saving of 32%.

Resolution Size Original 4k 357 MB Encoded 4k 177 MB Encoded 1080p 38 MB Encoded 720p 20 MB Encoded 480p 10 MB Encoded All 245 MB

VP9

VP9 is an experimental technology, created by Google as an alternative to the proprietary H.265. It offers further reduction of encoded file size while retaining or improving visual fidelity.

Towards server-less architecture

Moving video encoding step to the client will allow us to remove dependence on centralized upload servers. After the encoding step, client applications can publish the videos with the thin JSON-RPC wrapper talking to full blockchain node(s). Thanks to the recent addition of WebRTC support in js-ipfs, clients could seed their videos to other P2P nodes on the network, right from the browser.