Re-thinking Streaming: Why Virtual Avatars?

Reduce “Zoom Fatigue” by leveraging AvatarWebKit to create a unique and enriching digital avatar streaming experience

Alexander Beattie
QuarkWorks, Inc.
6 min readMar 25, 2022

--

Zoom Fatigue impacts health and wellbeing

Introduction

The onset of the COVID-19 pandemic in Spring 2020 prompted a rapid and unprecedented adoption of streaming in nearly all aspects of life from, education to business and even leisure. Two years after the pandemic began, streaming services like Zoom, Teams, and many others have become household names. We now rely on these services to connect us with friends and family, conduct work meetings, and connect when we cannot be physically present.

The Problem: Trying to Replicate In-Person Meetings

With the rapid adoption of new streaming tools, many have tried to replicate the in-person meeting experience without utilizing the benefits that these new tools can provide. By now many of us have probably experienced “Zoom Fatigue” from being in constant virtual meetings throughout the day. A similar physical day of meetings does not cause the same fatigue.

One of the contributing factors of this new digital fatigue is constantly watching yourself during a meeting. In a physical interaction, an individual does not constantly watch themselves to ensure their hair, makeup, and posture are aesthetically pleasing. In a virtual interaction, with your own camera feed constantly present, it’s nearly impossible to avoid monitoring and adjusting your aesthetics to project a favorable view of yourself to your virtual acquaintances. To use streaming technology effectively, we must abandon the idea of re-creating the in-person experience and leverage new digital tools to create a unique and enriching digital experience.

Streaming Resource Requirements

Conventional video streaming solutions all suffer from the same problem: significant bandwidth requirements. Streaming video at the high quality we’ve come to expect to more than a few people requires significant bandwidth, low latency, and powerful servers. Additionally, the resource requirements for 2-way streaming scales exponentially. Going from 5 meeting participants to 10 will roughly quadruple the resources and bandwidth requirement for each participant. Furthermore, one or two participants with poor connection speed or bandwidth can degrade the quality for everyone even if all the other participants have sufficient bandwidth.

Bandwidth and Bitrates

For a preliminary comparison, we will start by considering a standard mono audio stream. Usually, this requires 128 kilobits per second (kbps) for each participant. So a five-person meeting would require each participant to send 128 kbps and receive 512 kbps since a participant does not need to receive their stream. This is achievable for most connections (even DSL and 3G mobile data). When a video is added to the equation, the numbers climb rapidly.

Using a standard frame rate of 30 frames per second (fps) for each comparison the requirements for video can be analyzed. A 320p video stream (lower resolution than a 90’s tube TV) requires a bit rate of approximately 1 megabit per second (Mbps). Doubling that resolution to a more respectable 720p requires a bit rate of approximately 5 Mbps. This means that for the same five-person meeting at 320p, each participant would be sending at a rate of 1 Mbps and receiving at a rate of 4 Mbps. The previously mentioned DSL and 3G mobile data connections would be unable to handle a meeting at this point. At 720p each participant would be sending 5 Mbps and receiving 20 Mbps. While this is possible for many connections, it quickly becomes untenable as the meeting scales. With 20 people at 720p, each participant would need to receive 95 Mbps. This is well below the global average and median internet speed for connected users.

Compromising Performance

Low resolutions are usually acceptable since as the meeting grows, the size of each participant on the screen shrinks, so the resolution degradation is not noticeable. This creates a significant load on the server since it must receive each stream at full quality and then downsample it to a lower resolution to send to certain participants. Since some participants will want to view some streams at full quality (for example in speaker view), the server will need significant computing power to manage the downsampling and quality sent to each participant. The required computational power makes running your streaming system expensive and complex to maintain.

Jitter and Latency

This analysis considers only bandwidth requirements for streaming. Latency and Jitter are also critical factors in video streaming performance. When participants’ networks have high latency or jitter, the stream can appear choppy and provide a negative experience. Many major streaming providers have created incredibly sophisticated techniques, including compression, down and upsampling, and many others to optimize the video streaming experience and reduce the network load. While these solutions act as patches to poor network connectivity, they do not solve the overall problem that streaming video is expensive, computationally intensive, and has exponential bandwidth requirements at scale.

The Solution: Leveraging 3D Avatars

Solving this problem requires a different approach to streaming. What do we wish to achieve from the video streaming experience? We desire to capture facial emotion and how people react and experience what we say. At QuarkWorks, we developed AvatarWebKit to capture facial emotion to enable real-time avatar streaming. With this technology, each person’s facial action units which describe facial emotion are detected and streamed.

It’s tough to change the experiences and technologies we’re used to using. That’s why we’ve created Hallway Tile to explore using 3D avatars with your existing video meeting platforms. With this MacOS application, you can create a virtual camera on your machine, which allows you to become your 3D Avatar in Zoom, Google Meet, or your preferred meeting platform.

The image below showcases our team using this technology in a Zoom meeting.

A Zoom meeting utilizing Hallway Tile 3D Avatar technology

Unfortunately, using Hallway Tile with an existing streaming service doesn’t provide the bandwidth-reducing benefits mentioned below. This is due to the fact that the avatar is being rendered as a video and then streamed. The next section describes the possibilities when AvatarWebKit is used as an end-to-end 3D Avatar solution.

Bandwidth Comparison:

With AvatarWebKit, only the 52 facial action units that describe facial emotion and some metadata can be sent to the server and streamed out to participants. This does not require any downsampling or optimization, which significantly reduces the required server-side compute resources. The rendering of the avatars is handled entirely client-side once the facial action units are received from each participant. With this technology, a single avatar stream requires approximately 256 kbps, the same size as two of the mono audio streams presented earlier! We are working on optimizing the data streaming to decrease the required bandwidth which will also decrease jitter and latency for a better user experience!

The table below shows the required downlink bandwidth required by each participant for 320p and 720p 2-way video streams compared to AvatarWebKit. For a meeting with 100 participants, there is a nearly 75% reduction in the required downlink speed compared to 320p video streams for the same amount of participants. This significant data savings also translates significant server hosting cost reduction due to reduced server load and processing, as well as, more computation and rendering being performed on the client-side.

Give Avatars a Try:

In the next article, we’ll do a technical deep dive into benchmarking, performance, and cost savings analysis of AvatarWebKit compared to conventional streaming. With AvatarWebKit we are leveraging new digital technologies to create a unique and enriching digital experience. Head over to joinhallway.com to give AvatarWebKit a try!

If you have other questions, you can also get in touch with us on Discord.

Subscribe to our newsletter for more updates on what we’re doing next!

--

--

Alexander Beattie
QuarkWorks, Inc.

Over the past four years I have lived in three countries and navigated the challenges of working, living, studying, and traveling during a global pandemic.