Rolling out Unified Plan support
The WebRTC folks at Google are pretty excited about shipping Unified Plan. Support for this has been on the roadmap for a very long time, it was agreed upon as a standard in 2013 and the Mozilla people implemented it in Firefox in 2015.
Implementation and experimentation started in Chrome early in 2018 and there are now plans to change the default sdpSemantics from “Plan B” to “Unified Plan” as an experiment in Chrome Beta and Dev before the end of the year. See the very honest “Applications may break” public service announcement for more details.
To enable Unified Plan support, construct your RTCPeerConnection like this in Chrome:
new RTCPeerConnection({..., sdpSemantics: 'unified-plan'}); // opt-in
Chrome plans to remove the support for “Plan B” at some point which makes it important to test and update when you are affected by the changes. The biggest change is the format of the SDP generated by the createOffer and createAnswer calls but there are some more subtle changes in behaviour as well since Unified Plan fits the transceiver model better.
Are we affected by Unified Plan changes?
We evaluated and tested quite a bit since the very first version of unified plan support landed in Chrome back in January. Giving the Chrome developers early feedback is usually better than noticing this very late in the process.
There are three different usage models of WebRTC under the hood of appear.in:
- peer-to-peer mesh with at single audio/video stream
- SFU-enabled multiparty, using at a single audio/video stream
- peer-to-peer mesh with one audio and up to two video streams
Peer-to-peer videochat (even in full mesh) is one of the cases where rolling out unified plan support was to be considered relatively safe as it should behave the same when there is only a single audio/video stream. Relatively… the “legacy” onaddstream which we still used (as one of the last remnants of the legacy API) was not working with unified plan. For us this was a good excuse to switch that over to the more modern ontrack event.
The SFU enabled multiparty case was designed to use only one audio/video stream per peer-connection which avoids the problem.
The last case which uses multiple video streams is actually where it gets interesting. It currently requires heavy SDP munging to enable interoperability between Firefox (which has supported Unified Plan since 2015), Edge (we use the ORTC shim which implements Unified Plan) and Chrome/Safari on the other side. We are using the PlanB SDP as a wire format here in order to minimize the number of times that the SDP munging has to happen.
Solving the easy problems first!
The first step of the roll-out consisted of opting out of Googles plans to switch the default sdpSemantics by explicitly passing the it
new RTCPeerConnection({..., sdpSemantics: 'plan-b'}); // opt-out
to the RTCPeerConnection constructor. Given the fairly large number of issues found before Google considered this ready I was not willing to take chances with an experiment that is not under my control.
The next step was to add a feature flag so we can roll it out gradually. We use Unleash as a feature toggle server and combined it with a roll-out strategy that ensured that all clients in the same room would get the same sdpSemantics. There should not have been big interoperability problems if that had not been the case but it is extra complexity that is best avoided. Using this strategy we can roll it out to a certain percentage of room names and, in the worst-case scenario, instantly roll it back.
The next step was to set up end-to-end tests. We use a fairly extensive Selenium test suite running on our GoCD CI cluster which has prevented us from both Chrome issues as well as our own mistakes in the past. Basically it runs the most common scenarios in an end-to-end test with different browsers and versions and ensures a video call can be established.
Changing the sdpSemantics feature flag only required copying the pipeline file (a 2000 line beast written in YAML) and adding the feature flag to the URL the test visits. Running that pipeline a dozen of times and not seeing any breakages gave me enough confidence to roll this out to real users.
We started slowly… 1%, 5%, 10%, then ramping up to 30% and then 40%. All while carefully avoiding the time window of a new Chrome rolling out which would have complicated things. The graph below shows the roll-out happening during November, the blue line is the absolute number of sessions using Unified Plan while the black line shows the number of sessions still using Plan B.
Errors would be showing up either in an increase in failures of setLocalDescription and setRemoteDescription or an increase in support volume. Neither happened so this part went pretty well and the roll-out continued. We did run into a minor issue with some tests using an older Firefox versions being affected by an interop issue which only happens when Chrome is doing unified plan but that was easy to avoid with some SDP munging.
The roll out of Unified Plan for our SFU connection started at the end of November and is progressing nicely.
Update March 2019: when Chrome 74 rolled out “spec compliant simulcast” which broke (despite promises it would not) simulcast with unified plan. Since this was, despite being a clear bug and a regression, not fixed appear.in went back to plan-b for SFU connections.
Postponing the hard parts!
Fixing the remaining peer-to-peer use-case that uses more than a single video track is somewhat tricky. It uses Plan B for signaling and changes to wire protocols are hard. Measuring how many connections are using this in
- a Chrome version < M69 which does not support Unified Plan (without bugs at least)
- Safari which only started supporting Unified Plan in recent Tech Preview (and does apparently not consider allowing the application to choose)
will influence the decision on how much effort to spend writing new SDP munging code (I still consider mangling to be more accurate) in addition to retiring the old code. Time is on my side here, the number of users on Chrome < 69 is going down every day.
Some progress at least
Rolling out Unified Plan has required multiple weeks of effort. While much can be said about SDP as a data format, the situation hasn’t been helped by years of inconsistencies between browsers. Ensuring that every endpoint is able to use unified plan may take years still, but at least we’re seeing movement towards a common standard.