Browser screenshot tool with MediaStream API

Mario Cardoso
Onfido Product and Tech
3 min readApr 29, 2022

--

Problem

Recently we took on a challenge to create a solution for our Analysts who needed to perform a manual action as part of a traceability process. They needed to verify a user-submitted security code against a UK government database to prove a person’s right to work. Then, they needed to take a screenshot to keep a record of it. Considering our requirements we needed to control and automate the screenshot functionality in our application.

Solution

Browser screenshot tool.

The answer to the problem was to build a simple tool to take screenshots within the browser. But how? MediaStream API to the rescue!

What we did consisted of streaming one tab content to another tab video element and capturing one frame of the stream, turning it into an image, and sending the image to the server.

Let us first take a look at what the MediaStream API is and which parts we needed to build our solution.

MediaStream API

MediaStream API provides the interfaces and methods for working with streams and their tracks. It is related to WebRTC which provides support for streaming audio and video data.

Of all the interfaces that make up the MediaStream API we used the MediaDevices and MediaStream interfaces.

MediaDevices

From MDN Web Docs:

The MediaDevices interface provides access to connected media input devices like cameras and microphones, as well as screen sharing. In essence, it lets you obtain access to any hardware source of media data.

The media input we are interested in the screenshot tool is the screen sharing one, so we can obtain its media content and get an image from it. This interface has several methods that we can call to prompt the user to select a display, portion of a display (such as a window or tab), or turn on the camera and/or microphone providing a MediaStream object containing the media data.

MediaStream

From MDN Web Docs:

The MediaStream interface represents a stream of media content. A stream consists of several tracks, such as video or audio tracks. Each track is specified as an instance of MediaStreamTrack.

The MediaStream will contain the content of the stream, we will use it to feed the video element, get a frame, and convert it to the final image we want.

Tool Example

The simple sample tool we built has three main pieces, the react screenshot tool component and two main methods, one to set up the preview of the image and one to capture an image from the preview.

ScreenshotTool

The main component ScreenshotTool is fairly small, it has two buttons, one to start the image preview and one to capture the image in the preview element. To show the streaming content we have a video HTML element and to show the image captured we have an img element.

ImagePreview

ImagePreview is one of the two main functions. It is called when the user clicks on the Preview button.

It takes as an argument a reference to the video element that is used to set its source object with the MediaStream object that we get from calling the getDisplayMedia() and after selecting the tab or window we want to stream. By doing this the video HTML element starts to show the content of the stream, which in our case will be the content of one of the browser’s tabs.

ImageCapture

ImageCapture is the function responsible for taking the screenshot we want. In short, it takes a look at the stream content in the video HTML element, grabs a frame, and converts it to a png image.

We use the getVideoTracks function to get the stream frame, and then we use the frame to generate a png image using a canvas element. Instead of using a canvas element, we tried to use the ImageCapture class and takePhoto function, but it didn’t work because we can’t take photos of muted tracks. For more details take a look here and here.

Demo

Demo App

Conclusion

We can use these amazing interfaces to easily build solutions to our problems that initially may sound a bit complicated but most of the complicated parts are very well implemented and documented. This article shows a very specific problem and solution but I think it is a good, simple, and creative example of what you can build with the MediaStream API.

--

--