The End of Video Coding?

By Anne Aaron and Jan De Cock

In the IEEE Signal Processing Magazine issue November 2006 article “Future of Video Coding and Transmission” Prof. Edward Delp started by asking the panelists “Is video coding dead? Some feel that, with the higher coding efficiency of the H.264/MPEG-4 . . . perhaps there is not much more to do. I must admit that I have heard this compression is dead argument at least four times since I started working in image and video coding in 1976.

People were postulating that video coding was dead more than four decades ago. And yet here we are in 2018, organizing the 33rd edition of Picture Coding Symposium (PCS).

Is image and video coding dead? From the standpoint of application and relevance, video compression is very much alive and kicking and thriving on the internet. The Cisco white paper “The Zettabyte Era: Trends and Analysis (June 2017)” reported that in 2016, IP video traffic accounted for 73% of total IP traffic. This is estimated to go up to 82% by 2021. Sandvine reported in the “Global Internet Phenomena Report, June 2016” that 60% of peak download traffic on fixed access networks in North America was accounted for by four VOD services: Netflix, YouTube, Amazon Video and Hulu. Ericsson’s “Mobility Report November 2017” estimated that for mobile data traffic in 2017, video applications occupied 55% of the traffic. This is expected to increase to 75% by 2023.

As for industry involvement in video coding research, it appears that the area is more active than ever before. The Alliance for Open Media (AOM) was founded in 2015 by leading tech companies to collaborate on an open and royalty-free video codec. The goal of AOM was to develop video coding technology that was efficient, cost-effective, high quality and interoperable, leading to the launch of AV1 this year. In the ITU-T VCEG and ISO/IEC MPEG standardization world, the Joint Video Experts Team (JVET) was formed in October 2017 to develop a new video standard that has capabilities beyond HEVC. The recently-concluded Call for Proposals attracted an impressive number of 32 institutions from industry and academia, with a combined 22 submissions. The new standard, which will be called Versatile Video Coding (VVC), is expected to be finalized by October 2020.

Like many global internet companies, Netflix realizes that advancements in video coding technology are crucial for delivering more engaging video experiences. On one end, many people are constrained by unreliable networks or limited data plans, restricting the video quality that can be delivered with current technology. On the other side of the spectrum, premium video experiences like 4K UHD, 360-degree video and VR, are extremely data-heavy. Video compression gains are necessary to fuel the adoption of these immersive video technologies.

So how will we get to deliver HD quality Stranger Things at 100 kbps for the mobile user in rural Philippines? How will we stream a perfectly crisp 4K-HDR-WCG episode of Chef’s Table without requiring a 25 Mbps broadband connection? Radically new ideas. Collaboration. And forums like the Picture Coding Symposium 2018 where the video coding community can share, learn and introspect.

Influenced by our product roles at Netflix, exposure to the standardization community and industry partnerships, and research collaboration with academic institutions, we share some of our questions and thoughts on the current state of video coding research. These ideas have inspired us as we embarked on organizing the special sessions, keynote speeches and invited talks for PCS 2018.

Let’s innovate beyond block-based hybrid encoding.

MPEG-2, VC1, H.263, H.264/AVC, H.265/HEVC, VP9, AV1 — all of these standards were built on the block-based hybrid video coding structure. Attempts to veer away from this traditional model have been unsuccessful. In some cases (say, distributed video coding), it was because the technology was impractical for the prevalent use case. In most other cases, however, it is likely that not enough resources were invested in the new technology to allow for maturity. Unfortunately, new techniques are evaluated against the state-of-the-art codec, for which the coding tools have been refined from decades of investment. It is then easy to drop the new technology as “not at-par.” Are we missing on better, more effective techniques by not allowing new tools to mature? How many redundant bits can we squeeze out if we simply stay on the paved path and iterate on the same set of encoding tools?

The community needs better ways to measure video quality.

In academic publications, standardization activities, and industry codec evaluations, PSNR remains the gold standard for evaluating encoding performance. And yet every person in the field will tell you that PSNR does not accurately reflect human perception. Encoding tools like adaptive quantization and psycho-visual optimization claim to improve visual quality but fare worse in terms of PSNR. So researchers and engineers augment the objective measurements with labor-intensive visual subjective tests. Although this evaluation methodology has worked for decades, it is infeasible for large scale evaluation, especially, if the test set spans diverse content and wide quality ranges. For the video codec community to innovate more quickly, and more accurately, automated video quality measurements that better reflect human perception should be utilized. These new metrics have to be widely agreed upon and adopted, so it is necessary that they open and independently verifiable. Can we confidently move video encoding technology without solving the problem of automated video quality assessment first?

Encouraging new ideas means discussing with new people.

I (Anne) attended my first MPEG meeting three years ago where I presented an input document on Netflix use cases for future video coding. I claimed that for the Netflix application, encoding complexity increase is not a concern if it comes with significant compression improvement. We run compute on the cloud and have no real-time requirements. I was asked by the Chair, “How much complexity increase is acceptable?” I was not prepared for the question, so did some quick math in my mind estimating an upper bound and said “At the worst case 100X.” The room of about a hundred video standardization experts burst out laughing. I looked at the Chair perplexed, and he says, “Don’t worry they are happy that they can try-out new things. People typically say 3X.” We were all immersed in the video codec space and yet my views surprised them and vice versa.

The video coding community today is composed of research groups in academia, institutions active in video standardization, companies implementing video codec technologies and technology and entertainment companies deploying video services. How do we foster more cross-pollination and collaboration across these silos to positively lift all boats?

Building bridges at Picture Coding Symposium 2018

In the spirit of stimulating more perplexed looks that will then hopefully lead to more “aha!” moments, we have organized a series of “Bridging the Gap” sessions for PCS 2018. The talks and panel discussion aim to connect PCS researchers with related fields and communities.

  • Researchers in computer vision and machine learning are excited to apply these techniques to image compression, as demonstrated by the CVPR Workshop and Challenge on Learned Image Compression. ​Johannes Ballé will give an introduction on the emerging field of learned image compression and summarize the results of this CVPR Workshop and Challenge.
  • Video experts from ITU-T VCEG and ISO/IEC MPEG are actively working on the next-generation standard VVC. The Co-Chairs of this activity, Gary J. Sullivan and Prof. Jens-Rainer Ohm, will give a summary of the results, to encourage early feedback and participation from academic researchers and potential industry users of the technology.
  • To address the disconnect between researchers in academia and standardization and the industry users of video coding technology, we have invited engineering leaders responsible for large-scale video encoding. Michael Coward from Facebook, Mark Kalman from Twitter and Balu Adsumilli from YouTube will participate in a panel discussion, sharing their thoughts and experiences on the challenges of encoding-at-scale for VOD and live video streaming services.

To address some of the critical questions in video compression today, we have also organized Special Sessions on Machine Learning of Image and Video Compression, Image and Video Quality Assessment with Industry Applications, and Content Preparation and Compression for VR. In addition, we will have excellent keynote talks by Prof. Vivienne Sze from ​Massachusetts Institute of Technology, Prof. Al Bovik from The University of Texas at Austin, and Prof. Yao Wang from ​New York University.

We hope that Picture Coding Symposium 2018 will build bridges, spark stimulating discussions and foster groundbreaking innovation in video and image coding. Join us in San Francisco to help shape the future of video coding!