Implementing WebRTC Screen Sharing in a web app, late 2016

I recently added screen sharing support to our new video conferencing app, Locus. To figure out the implementation, I found a lot of useful information online, however it was quite scattered, with a lot of outdated posts. Here’s my attempt to summarize the current status and hopefully ease the process for the next developer. I don’t claim to be an expert, there are likely better ways to do some of this, but below did work for me.

Accessing the screen capture stream

Both Chrome and Firefox currently have built-in support for screen capturing in the desktop versions. Screen sharing is not supported on mobile, as far as I know (mobile can still receive screen shares). In both browsers, screen capturing is achieved through the MediaDevices.getUserMedia() (gUM) interface. gUM can be called once to get an user audio / video stream, and a second time to get a screen stream. The details are sadly not common between browsers and seem poorly documented. Note that we are also using adapter.js.

Chrome

Chrome has built in support for screen capture, however to gain permission to use this functionality, an application must use a Chrome Extension. (Unless of course you are Google and kindly white-list yourself, as they have done for Hangouts) The extension uses chrome.desktopCapture.chooseDesktopMedia() to return a sourceID. The sourceID is then used as part of the gUM constraints.

Chrome Extension architecture from https://developer.chrome.com/extensions/overview

Google provides extensive documentation for extension development, which I won’t repeat all of, nor claim to be an expert on. However in short, a simple extension for screen sharing will have a content script which runs in the context of your web page and a background script running in a seperate extension context. The content script can communicate with your web app by sending messages to window or via DOM manipulation, whereas the background script can not. The background script can access all Chrome extension api’s, but the content script can not. The content script and background script can communicate with each other via chrome.runtime.connect(). So the basic sequence is:

  1. Your web app asks the content script for a screen share sourceId.
  2. The content script passes this request to the background script.
  3. The background script calls chrome.desktopCapture.chooseDesktopMedia() and returns the sourceId back to the content script.
  4. The content script returns this to the web app, which finally calls getUserMedia() with sourceId as one of the constraints.

Muaz Khan has an open source example extension that is a helpful starting point. There are, of course, some complexities to make this suitable for production, like making sure the extension starts running on already open tabs right after installation. I leave these as an exercise for the reader 🤔.

For Chrome, the gUM constraints are:

constraints = { 
video: {
mandatory: {
chromeMediaSource: 'desktop',
maxWidth: 1920,
maxHeight: 1080,
maxFrameRate: 10
minAspectRatio: 1.77
chromeMediaSourceId: sourceId
}
}};
*** or ***
constraints = {
video: {
width: {max: 1920},
height: {max: 1080},
frameRate: {max: 10},
deviceId: {exact: [sourceId]},
mediaStreamSource: {exact: ['desktop']}
}
}};

Note: the 1st set of constraints are what we are using in Locus, and adapter.js is converting them to the 2nd set. Oddly, the version of adapter.js we are using rejects the 2nd set as input.

Edit: According to the comments, mediaSource should work where mediaStreamSource failed.

With these constraints, Chrome will pop up a nice picker UI to the user. With this UI, the user can choose to share their entire screen, a particular application window, or a particular Chrome tab.

Chrome screen share UI

Firefox

For Firefox, the situation is (maybe) simpler, but a bit less flexible. Instead of using extensions to control screen sharing permission, Firefox maintains a white-list of trusted sites. Information on getting your domain onto the white-list is here, which basically involves submitting a specific bug report to Mozilla. Note that as of this writing, I have followed those instructions for Locus, and not actually received any response. So I can’t vouch for this actually working! (Readers are welcome to up vote our bug)

Edit: According to Jan-Ivar in the comments, the white listing mechanism in Firefox is going away! Edit2: Confirmed in release notes for Firefox 52.

The white list is user editable. In Firefox:

  • Go to about:config
  • Search for media.getusermedia.screensharing. This should show at lest 2 options.
  • Make sure media.getusermedia.screensharing.enabled has value of true.
  • For preference media.getusermedia.allowed_domains the value property contains a list of domains. Add inthelocus.com and *.inthelocus.com to the list (or your own domain).

For Firefox, the gUM constraints are:

constraints = {
video: {
mediaSource: "screen", // whole screen sharing
// mediaSource: "window", // choose a window to share
// mediaSource: "application", // choose a window to share
width: {max: '1920'},
height: {max: '1080'},
frameRate: {max: '10'}
}
}

As you can perhaps guess from these constraints, Firefox does not currently provide a UI for the user to choose between all sharing options. Instead, if screen is requested, the user can only share their entire screen, and if window is requested, the user gets a UI to choose which window to share. In either case, a permission popup is presented identical to the mic and camera permissions. I didn’t do a lot of poking on this, but window and application seemed to do the same thing.

Sharing the stream over peerConnection

The screen sharing gUM call returns a mediaStream which can be shared over peerConnections just like any other WebRTC mediaStream. A simple and apparently common implementation only sends one video track over the peerConnection. So when a user starts sharing, their webcam video is stopped.

Sharing video and screen simultaneously becomes a little more complicated, especially in a peer to peer mesh configuration. WebRTC browser implementations transcode video streams separately for each peerConnection. This is computationally intensive and together with bandwidth, limits call size even when just sharing a single video stream. Adding a second video stream to transcode, doubles CPU load, or even worse, since screen sharing is typically high resolution and the transcoding cost increases with resolution. Reducing frame rate can help, but in our mesh configuration we still found the cpu loading from transcoding webcam video + screen video simultaneously to be too high.

On a positive note, the video codecs do a decent job of delivering screen sharing at a low bitrate. When a large change happens on screen, the bitrate can spike to 3–4 Mbps, however when the screen has little or no changes, this falls to ~0.07 Mbps.

Bitrate log for sharing 1050x1680 at 3 FPS as video track

So then how can we send simultaneous webcam video and screen? One solution is to forget the mesh network and instead utilize an SFU or MCU type solution. At Locus, we stubbornly wanted to stay away from the server solution. We ended up rolling our own simplified transcoding method using images and sending these over a dataChannel. Recognizing that sections of a screen sharing frame are either static, or have sudden large changes allows solutions that might not be viable for typical video.

Conclusion

Browser support for screen sharing is reasonably robust and simple at this point. I am a mechanical engineer by training, not an expert developer, and had never written a Chrome extension, but was able to get screen sharing implemented and released in a bit more than a week.

There are some limitations and room for improvement to compete with native applications. Some interesting areas would be:

  • A better picker UI in Firefox, similar to what Chrome has now implemented
  • Ability for remote users to control the mouse of the sharer (behind some strong permission)?
  • Ability to draw screen overlays outside the browser tab to allow sketching over the shared screen or to float meeting participant videos while the conferencing app is off-screen. (Likely there are some significant security concerns which could make this a bad idea)

Share your thoughts in the comments, or send them to chris@inthelocus.com.

If you like what you read, be sure to 💗 below.