SFrame.js: end to end encryption for WebRTC

If you have followed the news lately, you would already now about insertable streams and how it enables implementing end to end encryption in WebRTC.

Also, you should be already aware about the existence of SFrame, the end to end encryption mechanism which is used in Google Duo and in within Cosmo’s commercial products.

Last week, in an effort to widen the adoption of end to end encryption in WebRTC products and services, a more recent and more formal version of SFrame has been uploaded as a standard track IETF draft. Also in order to retrieve feedback from the community and improve the draft in newer versions mailing list has been setup.

But there was a missing piece..

Running code

To fill the gap in “rough consensus and running code”, today, we are happy to release publicly SFrame.js, a library which implements the SFrame draft in pure js and based in webcrypto.


Before getting into the details, I would like to thanks both Emad Omara (main author of the SFrame draft and co-author of MLS) for his patience explaining crypto stuff and Lorenzo Miniero for the “free beta-testing” and collaboration when including the SFrame.js in Janus, who has also wrote a great article about the whole process that you must read now!

Differences from sframe current draft

  • keyIds are used as senderIds.
  • IV contains the keyId and the frame counter to ensure uniqueness when using same encryption key for all participants.
  • keysIds are limited to 5 bytes long to avoid JavaScript signed/unsigned issues.
  • Option to skip the VP8 payload header and send it in clear.
  • Ed25519 is not used for sign/verify as it is not available in webcrypto (however there is an intent to prototype in blink and some skeleton code available in Chrome already), ECDSA with P-512 is used instead.

Bring your own KMS (BYOKMS)

If you have read the SFrame draft already, you will realice that there is a (big) missing piece to deploy an SFrame based e2ee encryption solution: key management and exchange.

Why? Mainly because two reasons, there is already an IETF effort going on to provide this feature the Message Layer Security (MLS) and it is quite common for organizations providing secure communications to already have some kind of KMS mechanism that could be leveraged by SFrame.

The only requirement is that each participant in the conference must have an associated numeric id (the senderId) and an associated symmetric encryption key.

If you still don’t have a KMS in place, while it is not recommended, you can still choose to go for a simple e2ee scheme with a common shared encryption key across all participants and skip the signature part of the SFrame.

How to use SFrame.js

The library provides both a high level client wrapper, which provides an API to communicate with the encryption worker running all the frame encryption and decryption.

Once you import the SFrame module into your project, you can create a Client which will be bounded to the specified senderId.

While the future proof way of supporting e2ee via insertable streams is by implementing support for the generic rtp packetization and the generic video descriptor rtp extension header, we have enabled the skipVp8PayloadHeader which will allow you to use VP8 codec without it, by sending the VP8 payload header in clear.

You can also use VP9 directly as the VP9 header description containing the required information for SFU layer selection is added after the e2ee encryption, so it is sent in clear. Supporting H264 without the generic paquetization or the generic video descriptor rtp extension header, while doable, is much more difficult and does not compensate the effort.

As said before, the keyIds are numeric as they need to be sent in each frame, so sending a string uuids would cause too much overhead (specially on audio). Note that they are variable length encoded, so starting from 0 and incrementing the counter on each participant would provide the best performance.

You would also need to set the 32 bytes encryption key (and optionally the private key for signing) before encrypting any data:

Once this step is done, you can encrypt your peerconnection senders:

Note that you will need an unique id to be passed to the encryption method, which can be either the transceiver.mid or the track.id or any other one that you implement (as long as it is unique).

Internally the encrypt method will create the insertable streams for the sender and transfer them to the web worker so the encryption process is performed there.

You would do it similarly for decrypting the receivers:

When a frame with a new keyId is correctly decrypted on a RTCRtpReceiver insertable streams, you will get an event so you can associate the authenticated senderId being received on the receiver.

However, in order to be able to decrypt the frames received by worker in the insertable streams, you will need to add a new receiver with its associated keyId and setup their symmetric key for encryption and public key for verification:

Signature verification

SFrame performs sender authentication by signing the authentication tags of several frames and sending the signature in the last frame of them.

SFrame.js will send signature information for each stream periodically and verify that the signature received for remote senders is valid according to its public key, but it is not clear which is the most appropriate way of signaling this back to the application.

An event on successfully verifying the signature feels appropriate, but not all frames may be signed and the frames with the signature may be dropped (either by the SFU or the network), so a binary state on the stream “authentication verified”/“not verified” doesn’t seem appropriate and maybe an stats based approach about frames verified vs frames received would be better.

Key rotation and ratcheting

E2EE keys should be rotated during the call when people join and leave the conference, these new keys are exchanged using the same E2EE secure channel used in the initial key negotiation.

Sending new fresh keys is an expensive operation, so the key management component might chose to send new keys only when other clients leave the call and use hash ratcheting for the join case, so no need to send a new
key to the clients who are already on the call.

SFrame and SFrame.js supports both, by either updating the encryption key for the sender or receiver, or by ratcheting the sender key:

Note that you don’t need to ratchet the receiver keys as SFrame.js will automatically try to ratchet them when a frame decryption fails.

Demo time!

Well, it is not as spectacular as the face detection demo, but running code means running code, so you can check the Medooze e2ee echo tests here.

What’s next?

  • Code review.
  • Code review.
  • Improve SFrame draft with enhancements and feedback from the community.
  • Explore integration with different KMS, MLS being the preferred one.

Written by

Doing RTC media servers since 2003.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store