Compression is Again Driving the Evolution of Media
I am thrilled to have joined 8i in October to help with the compression and communication of volumetric media. 8i invented technology to create photorealistic volumetric holograms, which can be viewed from any angle in virtual, augmented, and mixed reality. As 8i cofounder and CTO, Eugene d’Eon, told me before I joined, “There are unbounded world-class compression research opportunities at 8i.”
Indeed. The birth of volumetric media, known popularly as holograms, is an extraordinary moment in the history of human communication. Holograms represent a new generation of immersive media: the third generation after audio and video. Immersive media are information-rich signals capable of making people feel as if they were brought together with other people in other places. The telephone, invented 140 years ago, introduced the first generation of immersive media. The television, invented 90 years ago, introduced the second generation of immersive media. In fact television was so named because it was intended to be the visual counterpart to the telephone.
The Three Generations of Immersive Media
It is hard to overstate the impact of the telephone and the television on humankind. Everything imaginable in our world has been affected by these two technologies. Industries worth many hundreds of billions of dollars have grown up in numerous sectors around the telephone and television, including hardware manufacturing, content production, and content distribution. Moreover, audio and video as immersive media have had repeated massive societal impact with each new technical advance in signal communication — for example with the inventions of terrestrial radio broadcast, the Internet, and cellular communication, each of which has spawned entirely new hundred billion dollar industries revolving around the original inventions of audio and/or video. And now we have holograms.
If history is any guide, the emergence of holograms as the third generation of immersive media will have profound and repeated impact on the future of human society.
Holograms will become a critical part of the Virtual and Augmented Reality experience. Many investors and analysts see VR and AR as the fourth generation of computing platform after the PC, web, and mobile. However, this view does not fully appreciate the fact that immersive media — audio, video, images, and soon holograms — are central to all of these computing platforms. Think about it. Social networking, a key driver of all of these platforms, is built on the exchange of immersive media. The web is practically entirely about communication of media. Mobile phones were of course born on communication of audio, and have since become hubs for communication of all media. If people adopt VR/AR as a computing platform it will be because immersive media, such as human holograms, are made more consumable through the new platform.
Compression is a Core Enabler of Holograms
While there are those who appreciate the primary role that holographic media will play in VR/AR, some may miss the essential role of compression in enabling it. To me, it seems that most technologists who are thinking about holograms today are thinking about either how to display the medium, or how to capture it. This is like being obsessed with microphones and loudspeakers for audio, and cameras and displays for video, whereas in fact the impact of audio and video on human society has been at least as much in communicating the audio and video as in capturing and rendering it. There is essentially no value in immersive media that is not communicated. Communicating the media requires compression, and that is why compression has to play a central role in advancing VR/AR as a computing platform.
I have been around long enough to witness key moments in the history of media compression. In 1979–80, as a senior at Princeton University, I worked on my senior thesis as an intern in the Visual Communication Research Department of Bell Laboratories, in nearby Holmdel, New Jersey. My mentor, Arun Netravali, head of the department, had just invented the first motion-compensated video codec. I modified Arun’s codec to be the first video codec with multiple reference frames, at a time when memory was so expensive that using multiple reference frames was deemed extravagant. Now multiple reference frames, and of course motion compensation, are in every video codec. Down the hall was Bob Lucky, the Lab Director. Bob had on his desk what he claimed was the last working AT&T Picturephone in the world. He would often complain that he never received any calls on it. For everyone in Bob’s lab, that last AT&T Picturephone represented the end of the era of analog video, and the beginning of the era of digital video, which we were all working on.
In the mid-1980s, as a PhD student at Stanford University, I invented a methodology for lossy data compression now called Lagrangian rate-distortion optimization. Because this method provides an optimal way to choose between multiple encoding choices based on both quality and bit rate, Lagrangian rate-distortion optimization has arguably enabled all significant advances in video compression algorithms since the mid-1990s. After Stanford, I returned to Bell Laboratories, this time as a full-time Member of Technical Staff in the Signal Processing Research Department in Murray Hill, New Jersey. While I was there, Jim Johnston in our department invented perceptual audio coding, the basis of the MP3 audio codec. Meanwhile, our sister department, Acoustics Research, was nearing the end of its decades-long effort to bring speech coding bit rates down from 64 kbps to under 8 kbps at the same quality, enabling cell phones to fit in our pockets.
In the mid-1990s, I returned to California to work at the famed Xerox Palo Alto Research Center. While at PARC, I taught a graduate level course in data compression at Stanford. Soon thereafter, I joined the Internet startup VXtreme to run the compression team there. The business of VXtreme was streaming video over the Internet. We produced the first commercial web video streaming on demand, beating Real Networks to market by two months. Fast forward to today and the entire television industry is now moving to video streaming over the Internet. Soon we were acquired by Microsoft, and joined colleagues there to form the Windows Media team.
At Microsoft, we invented client-driven multi bitrate (MBR) streaming, which eventually became standardized as Dynamic Adaptive Streaming over HTTP (DASH), now in use by all major players such as NetFlix. We also invented a method for fast start, used in essentially all streaming video today, including YouTube, to start playback immediately despite the need to buffer several seconds of data before beginning playback. I also proposed to MPEG that they standardize a streaming file format, MP4 as it is known today.
So I’ve been witness to some key developments in audio and video compression and delivery over the years and in my view, compression and delivery of holograms will follow a similar trajectory — likely faster, but they will go through all the same issues. The way I see it, 2016 is to holographic compression what the mid-1980s were to video compression. In 1980 when I was working on the first motion compensated video coder, it wasn’t even the block-based motion compensation with side information used today; it was pel-recursive, which was pixel-based and backward adaptive. Only a few years earlier, in 1977, the Chen and Smith paper advocating DCT blocks for image compression appeared in the IEEE Transactions on Communication. By the end of the decade, however, the paradigm for still image compression and for video compression would be set. The 8x8 DCT would become the core of the compression paradigm for both image compression and video compression, as standardized in the late 1980s by JPEG, H.261, and MPEG-1. Block-based transform coding and block-based motion compensation would become the central coding paradigm, and would last for decades. Though there would be challengers over the years, such as wavelets and matching pursuits, none would be able to displace the central paradigm, which would hold sway for over thirty years.
Entering the Era of Holographic Media
We are now in an analogous period of rapid advances, experimentation, and uncertainty prior to the establishment of a central paradigm for the compression of volumetric media. My colleagues and I have published numerous papers in recent years illustrating various ways to compress volumetric humans [see references 1–19 below]. Each new paper presents an approach yielding what seems to be a 10–30% reduction in bit rate compared to the previous state-of-the-art approaches. That’s an indication of how fluid the area is. In contrast, recent papers in video compression are considered significant if they can show even a 2% decrease in bit rate compared to the previous state-of-the-art.
However as for video in the mid-1980s, this is about to change. Both JPEG and MPEG will be issuing a Call for Proposals in January for compression technologies related to point clouds, light fields, holograms, and other “volumetric” media. By the middle of 2017, leading candidates for compressing this new medium will likely emerge. Among these candidates may very well be the paradigm that will stick with us for the next thirty years. It is likely, I believe, that the future paradigm is one that has already been proposed… we are just waiting for it to emerge.
This is a thrilling time to be in this space. Compression is only the beginning. Once the community begins to agree on the basics of representation and compression, other aspects of communication will come to the fore: editing, streaming, content protection, content aggregation, advertising, file formats, applications, protocols, interactivity, and so forth.
A lot can happen in 20 years. Almost twenty years ago, we produced the first streaming video on demand over the Internet. Almost twenty years before that, we produced the first motion compensated video. Twenty years from now, holograms may become as ubiquitous in communication and storytelling as digital video is today. I am expecting to look back on this moment as the birth of volumetric media, and to see that 8i has played a key role in establishing not only how content is created and consumed in this new medium, but also how it is compressed and how it is delivered over the network to end users.
1. H. Q. Nguyen, P. A. Chou, and Y. Chen, “Compression of Human Body Sequences Using Graph Wavelet Filter Banks,” Int’l Conf. on Acoustics, Speech, and Signal Processing, May 2014. 2nd place, Best Student Paper Award.
2. Cha Zhang, Dinei Florencio, and Philip Chou, “Graph Signal Processing — A Probabilistic Framework,” no. MSR-TR-2015–31, April 2015.
3. D. Thanou, P. A. Chou, and P. Frossard, “Graph-based motion estimation and compensation for dynamic 3D point cloud compression,” in Int’l Conf. on Image Processing (ICIP), September 2015.
4. A. Anis, P. A. Chou, and A. Ortega, “Compression of dynamic 3D point clouds using subdivisional meshes and graph wavelet transforms,” Int’l Conf. on Acoustics, Speech, and Signal Processing (ICASSP), March 2016. Invited to special session on Signal Processing on Graphs.
5. P. A. Chou and R. L. de Queiroz, “Gaussian Process Transforms,” IEEE Int’l Conf. on Image Processing (ICIP), September 2016. Invited to special session on Graph-Based Multi-Dimensional Image Data Compression.
6. D. Thanou, P. A. Chou, and P. Frossard, “Graph-based compression of dynamic 3D point cloud sequences,” IEEE Transactions on Image Processing, vol. 25, no. 4, April 2016.
7. P. A. Chou and R. L. de Queiroz, “Modeling Signals Embedded in a Euclidean Domain,” Graph Signal Processing (GSP) workshop, May 2016. Abstract only.
8. C. Loop, Q. Cai, S. Orts Escolano, and P.A. Chou, “Microsoft Voxelized Upper Bodies — A Voxelized Point Cloud Dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673/M72012, Geneva, May 2016.
9. P.A. Chou and R.L. de Queiroz, “Transform Coder for Point Cloud Attributes,” ISO/IEC JTC1/SC29/WG11 input document m38674, Geneva, May 2016.
10. P.A. Chou and R.L. de Queiroz, “Rate-Distortion Optimized Coder for Dynamic Voxelized Point Clouds,” ISO/IEC JTC1/SC29/WG11 input document m38675, Geneva, May 2016.
11. P. A. Chou, “Coding for Augmented and Virtual Reality,” Packet Video Workshop, Seattle, July 2016. Invited Plenary talk.
12. P. A. Chou, “Telepresence: From Virtual to Reality — A Reprise,” Multimedia Signal Processing Workshop, Montreal, Canada, September 2016. Invited Keynote talk.
13. R. L. de Queiroz and P. A. Chou, “Compression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform,” IEEE Transactions on Image Processing. To appear.
14. R. L. de Queiroz and P. A. Chou, “Motion-Compensated Compression of Dynamic Voxelized Point Clouds,” IEEE Transactions on Image Processing. Submitted for possible publication.
15. R. L. de Queiroz and P. A. Chou, “Transform Coding for Point Clouds Using a Gaussian Process Model,” IEEE Transactions on Image Processing. Submitted for possible publication.
16. E. Pavez, P. A. Chou, R. L. de Queiroz, and A. Ortega, “Dynamic Polygon Cloud Compression,” Microsoft Research Technical Report MSR-TR-2016–59, available on arxiv.org, October 2016.
17. E. Pavez and P. A. Chou, “Dynamic Polygon Cloud Compression,” Int’l Conf. on Acoustics, Speech, and Signal Processing, March 2017. To appear.
18. C. Loop, Q. Cai, S. Orts-Escolano, and P. A. Chou, “A Closed-form Bayesian Fusion Equation using Occupancy Probabilities,” Int’l Conf. on 3D Vision (3DV), October 2016.
19. Ju. Hou, L.-P. Chau, Y. He, and P. A. Chou, “Sparse Representation for Colors of 3D Point Clouds via Virtual Adaptive Sampling,” Int’l Conf. on Acoustics, Speech, and Signal Processing, March 2017. To appear.