Can You Hear Me Now?

Learnings and challenges when running WebRTC on mobile devices

Richard Speyer
Tap the Mic

--

When we started working on Talko in early 2012, we knew that a new type of real-time communications service would be required to enable the scenarios we’d deliver to customers. Sure it’s VoIP, but with a number of key differentiators, including:

  • Support for LIVE (synchronous) and not-LIVE (asynchronous) voice; both recorded by default, and with the ability to very naturally flow between the two.
  • Dead-simple 1-tap team conferencing from the smartphone app.
  • Ability to work truly anywhere, anytime to support all the mobile network realities — including fully offline — that on-the-go teams experience.

As a small team, “rolling our own” VoIP platform from the ground up was not an option. We embraced the opportunity to utilize the achievements of the open source community. Doing so meant that more of our team could focus on nailing the user experience.

In this blog post I will explain why we chose WebRTC, what our experience with the platform has been, what challenges we have encountered specific to running on a mobile device, and what learnings we can share from our experiences.

Enter WebRTC

If we were starting Talko today, the choice to use WebRTC as our client-side VoIP solution would be a no-brainer. WebRTC and its forthcoming successor ORTC already reach over 1 billion web endpoints, with that number only expected to grow. It has gained adoption from many companies including Google, Mozilla, Opera, and most recently Microsoft with their support for ORTC. WebRTC has become the de-facto standard for real-time communications.

While its original and most common application remains browser-based real-time communications, WebRTC’s benefits extend beyond the browser and out to many types of real-time communication applications.

Open Source

Multiple companies have open-sourced their WebRTC implementations for any developer to compile and use in their own application. At Talko, we forked Google’s WebRTC implementation, which is the code that runs in Chrome and is licensed under BSD.

Cross-platform

Since WebRTC is written in C and C++, we are able to cross-compile the codebase across both iOS and Android. This means we can manage a single codebase for both platforms. And, since our implementation is based on the same code running in Google Chrome, all client apps — whether mobile-app or browser-based — connecting to our service will behave in similar fashion. Of course, it helps further that Mozilla Firefox and Opera both support the WebRTC standard and interoperate with Chrome.

When we started working with WebRTC we had to roll our own custom support for building on iOS. Since that time, the codebase now fully supports generating libraries for iOS, Android and more. This embodies the true promise of WebRTC — allowing us to support VoIP on many web browsers and mobile devices while implementing a single standard on our Talko media server and maintaining a single client library.

Latest and Greatest

Last, but certainly not least, the technology which WebRTC is built upon represents the latest and greatest in VoIP and real-time media.

The Interactive Connectivity Establishment (ICE) protocol is an excellent NAT traversal strategy. It ensures we can establish a bi-directional media connection from any Talko client, regardless of what network it is on or what network devices may lie between it and our media servers. Since our servers are always publicly addressable, we use ICE Lite. ICE also allows us to do intelligent networking. For example, we prefer WiFi connections (when available) over 4G/LTE, and UDP over TCP (which we may be forced to use to tunnel through firewalls). Lastly, the candidate selection algorithm in ICE naturally manages failover as network connections come and go, which is common when running on a mobile device.

The Opus audio codec, which is the primary audio codec for WebRTC, is unmatched in its capabilities. It boasts support for CBR/VBR, excellent quality to bitrate ratios, best-in-class packet loss concealment (PLC) via forward error correction (FEC), and the ability to dynamically adjust sample rate, bitrate, bandwidth, and all other settings on-the-fly to respond to changes in network conditions. In our testing and usage, we have found that Opus delivers crystal clear hi-definition audio without requiring massive amounts of bandwidth or battery. With all of these features, Opus has quickly become one of the best audio codecs available.

So what about WebRTC on mobile?

Building any sort of VoIP application can be quite challenging. Many of those challenges are amplified when running on a mobile device instead of an ethernet-wired PC. At Talko, this is where we have invested a substantial portion of our time to deliver an excellent, best-in-class voice experience to our customers.

When considering running a real-time communication application on a mobile device, I like to split the challenges into two categories: network and device.

Network

First thing’s first — the wireless internet connection on your mobile device will never be as good as the hard-wired desktop computer no matter how nice your router or how new your phone. When building a network-dependent application, you must assume that all of your users will be in questionable and variable network conditions, and prepare for it accordingly.

Packet loss is the most common and most widely discussed form of network degradation. While packet loss occurs even on an ethernet connection, it is much more common when trying to maintain a VoIP call while walking down Commonwealth Avenue here in Boston, for example. We integrated Opus FEC into our WebRTC codebase ourselves because it was not yet supported — since then, Google has implemented the API and it works wonderfully.

Another behavior we have often observed in mobile networking is jitter. On an ideal (S)RTP VoIP connection running Opus, an endpoint would receive one media packet every 20ms, and that packet would contain 20ms of audio. In this perfect case, your end-to-end latency is measured simply by the time it takes to send the packet from one endpoint to the other, with no need for any buffering. Unfortunately, this is not how networks perform in the real world.

Often we observe connections which, while not dropping packets, will deliver groups of packets in bursts. So instead of one packet every 20ms, you may get nothing for 80ms, followed by 4 packets, and then nothing for 60ms followed by 3 packets. Since you are not losing any packets, there is no reason that such a pattern should induce any audible pops or gaps to the user, and that can be handled by buffering incoming audio to ensure you keep enough data in your reserves to “mask” over any gaps. But how much audio should you buffer? How do you manage the trade-off between latency and quality that buffering induces?

This is where WebRTC’s NetEQ module comes in. NetEQ is an extremely complex set of classes which monitors your incoming audio and maintains a variable-sized buffer of data. Its goal is to ensure that when the system needs more audio to play, there is always something available, while not letting the buffer grow too large as that would lead to unacceptable end-to-end latency. This is not code that you want to write on your own, but it is code you should monitor as its behavior can tell you a lot about the quality of a VoIP connection. At Talko, we monitor the size of NetEQ’s buffer in milliseconds as well as what operations (accelerate/expand) it is applying to the audio.

On an ideal VoIP connection (top), an endpoint would received X ms of data every X ms. However, on a jittery connection (bottom) which can be common on 4G/LTE you may instead see much more sporadic behavior

On iOS, you can simulate all of these types of network behaviors and more using the Network Link Conditioner to better understand how your code will perform.

Device

When running on a mobile device, consumption of battery power must be a paramount concern to any developer. In general, the more compute intensive your application, the more battery you will drain, and running WebRTC is no exception. Encoding and decoding audio every N ms is a lot of work and — it will suck a user’s battery dry without some modification.

One of the first big battery discoveries we made at Talko was that WebRTC keeps both the WiFi and WWAN (4G/LTE) radios awake by sending periodic STUN pings on each interface as part of the ICE protocol. In general, this is nice because it allows near instantaneous failover between network interfaces. However, it comes at a major cost.

Many mobile devices prefer to only keep a single radio awake — WiFi if available, WWAN otherwise. WebRTC was breaking that contract. So we implemented an API to allow us to selectively disable the WWAN interface if WiFi was available, and vice versa. This led to a 50% reduction in battery consumption, but at the cost of slower failover between network interfaces. For Talko, we decided this tradeoff was a good one that net-net our customers would appreciate.

We provided this feedback to Google and WebRTC now offers a standard API to disable network interfaces. It can be found here (BasicNetworkManager::set_network_ignore_list(const std::vector<std::string>& list)).

Mobile devices also have limited compute power relative to desktop computers. On a PC, a user can instantiate a thread using the pthread API, request it be run at real-time priority, and be done. On iOS, if the same pthread API with the same configuration is used, the highest priority which can be set on a new thread is 47, which is not real-time. An attempt to execute a piece of code once every 20ms on this thread will fail. That thread is pre-empted by others, and it can end up sleeping 40, 60, or 80+ ms between callbacks. In comparison, the real-time audio thread, which is already instantiated and running in your application if you are doing I/O, runs at priority 98! That makes this thread an ideal place for us to run any of our time-sensitive operations, such as capturing/encoding/sending local audio and receiving/decoding/playing remote incoming audio.

To take advantage of this, we refactored a large portion of the WebRTC pipeline to run in a single-threaded manner on this thread when on iOS. This has provided more reliable timing and allowed us to reduce the overall number of threads contending for time slices in our application. This change involved a lot more than moving method calls. Since we were running many things on the same thread, we were able to carefully remove a certain amount of locks/semaphores, which was crucial to ensuring that we could run the full I/O process loop within the time slice given to the thread.

Learnings

Embrace native mobile APIs

As with any mobile app, the very best user experience is delivered by using and integrating with native mobile APIs as much as possible. For WebRTC and real-time applications that means:

  1. Monitor network change events to properly handle when a user transitions from WiFi to 4G/LTE and vice versa. For Talko on iOS, that means integrating with the Reachability API. This is the most authoritative way to know what networks are available to you for communication at any given time. It is important to note, however, that just because a network interface is available does not mean that it is viable for a VoIP connection. Consider when your device is connected to a hotel WiFi but not yet authenticated — Reachability will report WiFi is available despite you not being able to reach the internet. So, it is important that you combine this system information with data gathered from WebRTC such as STUN bind request success, round trip times, etc.
  2. Understand native audio behaviors and ensure that your application complies with them. On iOS, this means things like properly handling audio interruptions when a real phone call comes in, deciding whether or not to allow audio from your application to mix with other system audio, and properly supporting routing of input + output to speaker, earpiece, or Bluetooth.
  3. Prompt for appropriate permissions to access the device’s peripherals. Specifically, you need to gain permission to access the device’s microphone and, if you are doing video, camera. Beyond just asking for permission, it is important to include a descriptive message so the user understands why your application wants access to this peripheral and when you intend to use it. The value of these permissions cannot be understated — if a user declines permission, your application cannot function.

Instrument Everything

The WebRTC Statistics API has evolved substantially over the last few years. You can see live diagnostics of an audio call by navigating to chrome://webrtc-internals/ while in a live call in the web.

However, when you have a live application in the field, you need a way to ascertain call quality without actually having a user’s device. To help with this, we have implemented methods to upload anonymized VoIP statistics to our server from each endpoint and the media server after each Talko call. This allows us to internally monitor call performance and debug behaviors that groups of users experience while in calls together. Having this visibility is crucial to understanding and diagnosing issues in the field — we are continuously working to improve how we do this.

Bandwidth and Battery Matter

At the end of it all, the biggest differences between efficiently running a mobile app versus a PC app are those two major limited resources — bandwidth and battery. Users will not continue to use an app that drains their batter or their mobile data plan in a way that is not commensurate with how they perceive their use of the app. Every challenge and adjustment discussed in this post was motivated by a desire to maximize performance given the inherent constraints.

As changes and improvements are made over time, it is important to have a standard test matrix that helps detect and prevent these type performance regressions. You should be running through this matrix regularly and consistently to help identify these issues early in the development cycle as opposed to hearing about them from your customers.

WebRTC has been crucial to the progress and development of Talko. We look forward to sharing more about our media architecture in future posts, but for now I hope you’ve found these learnings helpful.

We’d love to hear about how you are using WebRTC in your projects, or any questions you may have. Please feel free to comment right here on Medium, find me on Talko, or email me at richards@talko.com.

--

--