ICE restarts

Philipp Hancke
4 min readSep 28, 2016

--

I’ve written about ICE failures before. Back then, I only looked at initial failures and did not talk about the failures that happened after the call was established. Let’s look at those and talk about ICE restarts.

ICE close to Tarfala, Sweden. Credits: Nina Kirchner

When the WebRTC connection fails we see a state change in the ICE connection state. There is an event listener you can add for this:

pc.addEventListener(‘iceconnectionstatechange’, function(e) {
console.log('ice state change', pc.iceConnectionState);
});

The iceConnectionState has a number of different values explained in the W3C WebRTC specification. For our purpose, the two most interesting states are disconnected and failed. Roughly, disconnected means that the connection was interrupted but may come back without further action. The failed state is a little more permanent, you need to do an ICE restart to get out of it. Typically, when the connection is interrupted, the ice connection state goes to disconnected and then to failed a while later.

When the ICE connection state goes to failed (or already when it goes to disconnected; your mileage may vary, in Firefox this is not such a great idea) you can do an ICE restart. That means you gather new candidates, send them to the peer and get new candidates from the peer. And hopefully your connection gets re-established. In code this means the following (see here for a more complete example):

pc.createOffer({iceRestart: true}).then(function(offer) {
return pc.setLocalDescription(offer);
})

You will notice the ice-ufrag and ice-pwd attributes in the SDP have changed. Send that offer to the peer who will generate a new answer with those parameters changed as well.

My coworker Dag-Inge implemented ICE restarts for Chrome back in March, crashing the nightly version of Firefox as soon as it started supporting ICE restarts :-)

But we never evaluted if ice restarts work and what impact they have. As you can guess its time for a little more data nerding! First, lets get the usual 100k dataset.

CREATE OR REPLACE VIEW dataset AS
SELECT * FROM features_permanent
ORDER BY datetime DESC LIMIT 100000;

Second, we need to detect the ice restart. That is done by looking for a createOffer call with the iceRestart property set to true. We call this feature (think of a column in our database) iceRestart:

SELECT count(*), icerestart
FROM dataset
GROUP BY icerestart;
count | icerestart
-------+------------
91269 | f
8731 | t

Around 8.7% of the connections try an ICE restart. We have seen a ratio of about 7.5% for a couple of months now.

That suggests that running a WebRTC service without ICE restarts is a bad idea… even though users might just choose to reload the page instead. Unfortunately, there is not much documentation on how to do an ICE restart. Which hopefully this blog post will remedy a bit.

So we have 8.7% ICE restarts. Now we need to check if that fixes anything. We do that by looking for another oniceconnectionstatechange to connected or completed and call this icerestartsuccess:

SELECT count(*), icerestartsuccess
FROM dataset
WHERE icerestart = ‘t’
GROUP BY icerestartsuccess;
count | icerestartsuccess
-------+-------------------
2712 | f
6019 | t

What this shows is that ICE restarts reestablish the connection in about two-thirds of the cases. More successful calls and better user experience!

But 30% failures are worrying. Can we explain this? Quite often what causes the underlying connection failure is that the remote client went away. We can detect this by checking if there was a corresponding setRemoteDescription call after the createOffer. If there was not, the remote client lost the signaling connection (the user might have closed the page) or does not implement ICE restarts:

SELECT count(*), icerestartfollowedbysetremotedescription
FROM dataset
WHERE icerestart = 't' AND icerestartsuccess = 'f'
GROUP by icerestartfollowedbysetremotedescription;
count | icerestartfollowedbysetremotedescription
-------+------------------------------------------
2312 | f
400 | t

So in 85% of the unsuccessful ice restarts we don’t get a new answer from the peer. Which probably means they are not connected to the signaling server any more. They might have closed the page. Or (less likely) it means their browser version does not implement ICE restarts; Firefox only recently started doing so in Firefox 47. Not much we can do at this point other than to show the user a message.

The remaining 400 cases will be subject to a more thorough investigation, there are certainly some bugs left to file. There is more opportunity in investigating the larger number of signaling failures however. Priorities…

As we have seen doing ICE restarts are a great way to improve the quality of a WebRTC service. They’re one solution to what is known as ‘walk out of the door problem’ where the user changes from WiFi to mobile networks. A rather heavy one compared to what is possible in the webrtc.org library currently as we can see with Google’s Duo which I suspect not to do ICE restart but prefer continous nomination.

--

--