Video Codec of the Future

Xuhui Shao
Foothill Ventures
Published in
4 min readMar 7, 2019

We have a problem with digital videos. They are big and getting bigger.

In my household, even just a couple of smartphones worth of kids and nature video clips have spilled over my hard drive, taken over my NAS (“network attached storage”), and have been clogging my network trying to backup to the cloud. Average Americans spend as many hours (15/week) watching streaming video as they did TV a decade ago, and they demand higher resolution streams. Overall, video traffic occupies more than 80% of the internet traffic and growing.

The video content based industry will be a trillion dollar business by 2023 with high growth due to entertainment, online services and expanding numbers of application areas such as video communications, personal smartphone 4K videos, surveillance, general robotics etc.

Large tech companies such as Google and Apple have been trying to stay ahead of this tsunami of digital video with ever more sophisticated compression technology (primarily video codecs). There’s a long string of these standards and tools such as H264, H265, VP8, VP9 and the latest one: AV1. Each generation of encoding standard has raised the complexity exponentially from before.

So why haven’t large tech companies solved this problem for everyone? There are several reasons:

  1. Video codecs are becoming exponentially more complex. The next-generation codec standards include hundreds of tools, i.e building blocks that can be used for diverse types of encoding tasks. Every tool typically contains a dozen or so parameters. The net effect is that one needs to tune thousands of parameters for every small block of video data in order to achieve optimal results. It is also impossible to optimize universally for all types of use cases. Each use case requires different kinds of trade offs between speed, quality and size.
  2. The engineering talent who can master AI/optimization/video encoding decoding complexity is a very limited pool. While large tech companies tend to have some of these engineers, they are more likely to work on projects that are more suitable for publications and advancement of academic careers.
  3. As a result, even large tech companies can only afford to optimize for their core use cases in video codec development. And when they have developed such expertise in one area of use case, they tend to guard it as a competitive advantage rather than sharing with other companies.

So what is AI’s role in this? It turns out, human brains are very good at compressing visual information. Our eyes can massive “visual data sets” (bits) due to high color sensitivity, depth of vision and sensory resolution (retina resolution). However our brains can only process a tiny fraction (about 1%) of those bits. So in other words our brains act as very efficient 100:1 video encoders by filtering out what we consider unimportant information. With better understanding of how human visual system works, AI can guide video codecs to preserve or reconstruct the most important bits, quickly and efficiently.

Furthermore, as we continue to ratch up the resolution of videos, the problem become even more apparent. Human eyes are not sensitive to noises hiding in edges. When the resolution is low, everything is an edge. When resolutions get higher and even higher, more areas become surfaces with details instead of edges or blurriness. Compressing such delicate details become much harder with conventional methods. Only advanced AI models can iteratively match human perception at all levels.

So how do I know all this? I’d like to thank Dr. Zoe Liu — cofounder of Visionular, a Tsingyuan portfolio company — for allowing me to interview her while preparing the content of this post. Dr. Liu has been working on the singular problem of video compression for over 20 years (dating back to her PhD, and through her work at Bell Labs, HP, Apple and Google). Her work is so significant that I will likely publish another blog post just on her and her achievements in this field.

We often significantly underestimate the things we have not experienced yet. As I watch the exquisitely rendered Youtube 4K/8K HDR 60fps video on my Retina 5K monitor, I do not think of bandwidth and storage. All I feel is the powerful visual impact… I forget I am looking at a screen, not a real window on a tropical bungalow.

When Google announced their Chrome browser, it was under the belief that once users experienced a small improvement in speed, they would never go back to the old standard.

Video is like that: once a user experiences higher resolution with smooth streaming, it is impossible go to back. The amount of bandwidth and storage related to video streaming is only going in one direction, and Tsingyuan is making multiple bets on the technologies that will allow this progression.

--

--

Xuhui Shao
Foothill Ventures

Managing Partner at Foothill Ventures: invest in early stage technology startups