Exploring the Media Streams API: A Deep Dive into Real-Time Audio and Video Processing on the Web

Vladislav Stepanov
firstlineoutsourcing
4 min readOct 9, 2023

Hi everyone. How often do you work with camera or microphone users on your sites? Or maybe you yourself, as users, have used it? Today we will talk about how you can work with it and what problems it can cause.

Well, let’s go!

When we talk about sites using the user’s devices, we need to consider the following factors:

  • It should work in most browsers
  • It’s got to work on phones
  • It has to be user-friendly

To work with user’s devices there is Media Capture and Streams API (Media Stream), namely MediaDevices. We will create a class to work with this API and will add the logic we need there.

export class MediaDevices {
private readonly mediaDevices: MediaDevices = navigator.mediaDevices;

// Requests a list of the currently available media input and output devices, such as microphones, cameras, headsets, and so forth
public getUserDevices(): Promise<MediaDeviceInfo[]> {
return this.mediaDevices.enumerateDevices();
}

// Prompts the user for permission to use a media input which produces a MediaStream with tracks containing the requested types of media.
public getUserMedia(constraints?: MediaStreamConstraints): Promise<MediaStream> {
return this.mediaDevices.getUserMedia(constraints)
}
}

At this point, we could say that we just combine these methods and enjoy life, but unfortunately, it’s not that simple.

Why do you need so many browsers?

Not all browsers have the same methods for receiving MediaStream. As an example, the following (the most popular ones) can be used:

  • Chromium-based, Safari — getUserMedia
  • Firefox — mozGetUserMedia
  • Old Chrome version — webkitGetUserMedia
  • *Let’s all forget about IE.

It looks like something that is not very convenient to use, and you want to simplify your life. Let’s update our getUserMedia method a bit.

public getUserMedia(constraints?: MediaStreamConstraints): Promise<MediaStream> {
return (
this.mediaDevices.getUserMedia?.(constraints) ||
this.mediaDevices.webkitGetUserMedia?.(constraints) ||
this.mediaDevices.mozGetUserMedia?.(constraints)
)
}

If you use TypeScript like I do, you can create a global.d.ts file and add the following to it to avoid problems with missing methods.

type GetUserMediaMethod = (...args: Parameters<MediaDevices['getUserMedia']>) => ReturnType<MediaDevices['getUserMedia']>;

// Update Global MediaDevices interfact
interface MediaDevices {
// Old chrome browsers
webkitGetUserMedia?: GetUserMediaMethod;
// Firefox
mozGetUserMedia?: GetUserMediaMethod;
}

Also You can code getUserMedia polyfill to support the oldest versions of browsers.

if (navigator.mediaDevices === undefined) {
navigator.mediaDevices = {};
}

if (navigator.mediaDevices.getUserMedia === undefined) {
navigator.mediaDevices.getUserMedia = function (constraints) {
const getUserMedia =
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia ||
navigator.getUserMedia;

if (!getUserMedia) {
return Promise.reject(
new Error("getUserMedia is not implemented in this browser")
);
}

return new Promise(function (resolve, reject) {
getUserMedia.call(navigator, constraints, resolve, reject);
});
};
}

This can be described either in a class or elsewhere. The main thing is that it should be called before calling getUserMedia method from MediaDevices class.

What about phones?

When working with phones, the principle is the same. However, it is on phones that I have come face to face with trouble.

On some devices, the getUserMedia method can be rejected with the error “Could not start video source”.

So here’s my tip to you. Always stop the previous MediaStream before getting a new one.

mediaStream.getTracks().forEach((track) => track.stop()); or

mediaStream.getAudioTracks().forEach((track) => track.stop()); or

mediaStream.getVideoTracks().forEach((track) => track.stop());

class MediaDevices {
//...

private currentMediaStream: MediaStream;

//...

public stopCurrentMediaStream(): void {
this.currentMediaStream?.getTracks().forEach((track) => track.stop());
}
}

In my case, I could not change the camera until the previous stream was stopped. The problem can also occur if your camera is already used in another tab. Unfortunately, this can only be fixed by disabling the camera in another tab.

Working with audio output

When we work with a microphone, sometimes we may want to be able to listen to ourselves. Or, let’s say we have a site for listening to music, and we want to output the sound to speakers. The Web API provides us with the following setSinkId function. It is relevant for HTMLMediaElement (HTMLVideoElement, HTMLAudioElement) and AudioContext. However, we should be careful here because although it can be a useful function, it doesn’t work in all browsers (you can see it here https://caniuse.com/?search=setsinkid).

In my case, I used AudioContext to play audio from the device, depending on the user’s choice. We need to check if this function is available in our browser. We also need to add a check on what ID we are using. If you don’t filter default devices in Chrome, you need to insert an empty string instead because setSinkId won’t understand what device id default is.

function setAudioOutput(audioContext: AudioContext, sinkId: string = ''): void {
const deviceSinkId = sinkId === 'default' ? '' : sinkId;
if ('setSinkId' in audioContext) {
audioContext.setSinkId(id);
}
}

The condition will return true if the function is available for the AudioContext in the browser. device. SinkId is a workaround for Chrome browser. Since it has selected default devices whose id will be a default, the setSinkId method doesn’t understand such an id.

Conclusion

At the moment, working with users’ devices in the browser is not very difficult. However, there are many pitfalls when we start to expand the range of devices we use. However, the more we know about it and share information, the easier it will be for all of us!

Welcome to the comments to tell about your interesting cases when working with similar ones. Thanks for your attention!

--

--