Measuring audio-video quality in our online learning tool

A core feature of our online learning tool is the ability to see and hear the person you’re having a lesson with. We provide this functionality using WebRTC, a set of APIs that give clients, such as browsers or mobile apps, real-time communications capabilities. This is the same technology used by Google Hangouts, SnapChat and others to allow for audio-video communication. In the case of our online learning tool, instead of interfacing with the raw WebRTC APIs we use a library called OpenTok — a wrapper around WebRTC that allows for easier integration into our application.

A few details on our implementation of OpenTok

OpenTok has two modes for deciding how clients will send an audio-video stream between each other. The first of these is called Relayed. In this mode, clients can interact directly with one another, making it a peer-to-peer connection. This is the mode that we initially used in our online learning tool. The main advantage is that if you have a small number of participants in a session (in our case, only two), you theoretically have the potential to offer them slightly better audio and video quality since you don’t have to go through a middleman.

The other mode that OpenTok makes available is called Routed. This mode does have a middleman: the OpenTok Media Router. This is a component of the OpenTok platform that handles routing audio-video streams. Instead of clients streaming directly to one another, they go through the router which manages the exchange of information. Routed mode is what we currently use in our application because of the advantages it offers compared to Relayed. The ones we care the most about are:

  1. The ability to temporarily fall back to an audio-only stream if a user’s connection becomes degraded.
  2. Recording audio-video streams in a session and exporting them as MP4 or WebM files.
  3. Allowing a client to receive audio-video quality statistics regarding the stream they are subscribed to.

For the rest of this post I’ll focus on the third item and explain how we get and use the data that helps us measure the audio-video quality our users experience in our application.

Gathering data

Audio-video quality data is useful to us in a variety of ways. It helps us understand the capabilities and limitations of our users’ network connections. It helps us drive decisions regarding audio-video related offerings and features we are considering adding to our application. It helps us monitor the status of the OpenTok Media Router and react to any issues that might be affecting our users… and so on.

Let’s look at how we actually gather this data in the first place. The OpenTok library has two main objects that facilitate the act of publishing or consuming an audio-video stream. They are aptly named Publisher and Subscriber. If person A and person B are having a lesson in our online learning tool, think of Publisher as what manages the act of sending person A’s webcam feed data to person B. Similarly, from person A’s point of view, Subscriber is what manages the act of accepting person B’s webcam feed data and displaying it to person A.

A useful feature of Subscriber is a method called getStats. By invoking it, you get information about the quality of the stream your client is subscribed to. This information includes things such as bytes received, packets received, packets lost, etc. For the entire duration of a lesson, we poll this getStats method on a 10 second interval and use the information it returns to calculate the following near real-time metrics:

  1. Current video bitrate
  2. Current audio bitrate
  3. Current video packet loss
  4. Current audio packet loss

Curious what this looks like? If you press Ctrl + Shift + Alt + D while in our online learning tool, you will see a small panel pop up with this information. Note that you will only see actual data if another person is also in the lesson with you (because only then do you have a stream to subscribe to and get data from).

At the end of a lesson, we use all of the data we have gathered from polling getStats to compile a final set of six audio-video quality metrics for each user:

  1. Average video bitrate
  2. Average audio bitrate
  3. Max video packet loss
  4. Max audio packet loss
  5. Percent of time where video packet loss exceeded 3%
  6. Percent of time where audio packet loss exceeded 5%

What’s up with those last two? OpenTok suggests that for a stream’s quality to remain at an acceptable level, packet loss should be at or below those 3% and 5% thresholds. In other words, these two metrics indicate what percent of the lesson a user spent with packet loss levels that could have caused their experience to be less than ideal.

How about some graphs?

We use Datadog for aggregating, displaying and working with metrics we gather from our online learning tool. We pipe metrics from StatsD into Datadog where we can turn them into graphs, track them over time, use them to trigger alerts and so on. The six audio-video quality metrics we highlighted above are sent to Datadog as histograms, meaning that we can see values for their average, median, minimum, maximum, 95th percentile and count. Let’s look at a graph of the average video bitrate metric reported at lesson end. We have selected a one day timespan:

The blue line tracks median values. The dotted orange line is 95th percentile values. The bars are the count — in this case, we use it as a relative indicator of the number of individual metrics that were aggregated to produce each set of values. We can see that our current settings for publishing a webcam stream via OpenTok (640x480 resolution, 15 frames per second) result in a max consumed bitrate of just below 400 kbps. We can also see that some users struggle to achieve that, either due to network load, congestion or other issues, machine configuration, firewalls, etc. Let’s shift gears and look at the percent of time where video packet loss exceeded 3%:

Here, we can see that one user spent almost 50% of their time experiencing packet loss values that most likely resulted in a less than great experience. Knowing this, we can proactively reach out to that user and give them tips on steps they can take on their end to ensure they have a better experience next time they use our application. Finally, let’s look at an actual example of how we can use these metrics to help us identify issues. Once again, we are looking at average video bitrate:

Under normal circumstances, we know that the average video bitrate for our users sits at a little below 400 kbps, with the occasional dip. However, in this particular case we see that it took a nose dive to below 100 kbps starting at around 2:30pm. Having been alerted to this, we contacted OpenTok who confirmed they were dealing with an issue on their side that affected end users. This issue was eventually resolved at around 6pm at which point we can see the average bitrate return to normal levels.

Conclusion

OpenTok comes with a tool called Session Inspector that allows you to look up information for individual sessions. Unfortunately, it doesn’t yet have a way to present you with a high-level view of aggregated quality metrics. The solution we put in place uses data gathered from OpenTok in real time piped through an aggregator (StatsD) to a graphing and monitoring back-end (Datadog). We have gotten a lot of use out of this custom approach and we’ll keep refining it as we work towards offering the best experience for users of our online learning tool.