How to programmatically capture screen on Android: a comprehensive guide

Published in

Bolt Labs

8 min readNov 11, 2019

At Bolt, we encourage employees to use the products we’re building, to provide feedback that helps us continuously improve. To make this process as simple and efficient as possible, we’ve added a special button to the version of the Android app used by employees internally. Upon tap, it gathers device logs, captures screen image, prompts for a description and sends everything directly to special channel with support representatives. Building a robust screen capturing mechanisms has its pitfalls, and we want to share results and the knowledge we gained with the community.

In this article, we’re going to expand on different ways to programmatically capture a screenshot of your Android app. To explain the pros and cons of every approach, we’ll also briefly touch on various UI framework components such as WindowManager, Window, Surface, etc. and their role in screen rendering.

Even though this might not sound like a task you’d commonly be faced with at work, reading the article still might be beneficial, because having knowledge of the APIs we’re going to discuss here might come in handy for developing other features and will deepen your understanding of the Android UI framework.

For those who’d like to skip reading and jump right into the code, at Bolt we developed and open-sourced a library that combines all the discussed solutions and provides an easy-to-use API abstracted from the Android framework and the complexities of the underlying mechanisms:

bolteu/screenshotty

The library combines MediaProjection, PixelCopy and Canvas drawing and provides an easy-to-use API, abstracted from the…

github.com

To find out how it works internally, check out the screenshotty-lib module. The repo also contains documentation and a sample app. We hope that the community will find our work useful, and of course, anyone is welcome to contribute.

Starting from the basics

When we talk about capturing a screenshot, basically what we mean is creating a Bitmap consisting of all displayed elements.
From this, we can reason that displayed elements are Views and we know how to draw them on a Canvas, so the solution to our problem is as simple as:

val content = window.decorView
val w = content.width
val h = content.height
val bitmap = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)
content.draw(Canvas(bitmap))

Right? Well, not really. things are often a bit more complex.

There might be several DecorViews

If there’s a Dialog or a PopupWindow on the screen, they simply won’t be captured. To understand why that’s the case and what we can do about it, let’s make a quick overview of components involved in the View rendering process.

In the ActivityThread, when an Activity is created and attached to the Application context, internally it creates a Window and a WindowManager and registers different window callbacks. You can think of a Window as a container for a View hierarchy and the WindowManager is essentially the entity responsible for managing various properties of this container such as transforms, orientation, lifecycle, focus events, etc. Every Activity and Dialog has its own WindowManager instance, but internally all these instances share a WindowManagerGlobal singleton where all the logic for talking to the system WindowManager resides.

When you call setContentView, a DecorView is created inside the corresponding Window object. This View contains a content container and is also capable of showing things like floating action mode menus on top of it. This DecorView is then added to a local WindowManager which internally just delegates to the WindowManagerGlobal. A reference to the given View is saved and a special ViewRootImpl object is created. This object implements ViewParent interface and acts as a bridge between the window’s view hierarchy and low-level graphics framework components. We’ll get back to this later because now we have everything we need to solve the problem of dialogs not being captured.

As we mentioned, WindowManagerGlobal is a singleton which stores references to all the added Views and their position on the display. So, if we know how to draw one DecorView, we can compose several of them together to produce the final image. There are no public APIs available to retrieve them, but we can use reflection for this.

The drawing part doesn’t change a lot. We use LayoutParams of the corresponding views to correctly position them on the screenshot and apply the background dim you see when a dialog is displayed.

Accessing implementation internals is a very unreliable approach: different API versions might require special handling and, worst of all, the implementation might be changed by device vendors. To safeguard from having everything completely broken, we recommend the following approach as an add-on: first, a publicly accessible decor view of the visible activity is drawn, then we try to get displayed dialogs and render them on top, returning the original bitmap in case of failure. This way the worst-case scenario is having screenshots without dialogs, not that bad as opposed to no screenshots at all.

Another thing to keep in mind here is that WindowManagerGlobal is a process-level singleton, and some of the views we get from it might belong to stopped activities, i.e. those not visible to a user. To render only the dialogs and popups that belong to the current activity, we have to check that their context is our current Activity.

Capturing a Surface

So, we now know how to deal with dialogs and popups, but what if there’s a SurfaceView in our view hierarchy? It might be there for some custom Open GL drawing, camera preview or navigation maps. If we just draw the decor view on canvas — we’ll get blank pixels instead of the SurfaceView content. To understand why this happens, let’s take a brief look at where a Canvas used by Views during a drawing phase comes from.

We can think of Canvas as a kind of controller that allows us to mutate a buffer of pixel colors that are then sent for rendering. The mechanism for transferring these buffers (or actually their handles, because copying would be too expensive) is called BufferQueue. The idea here is simple — the producer takes a buffer from the queue, modifies it and puts back, while the consumer dequeues buffers and performs processing. The class, acting as an interface for Producers, is called Surface. Every window is backed by one and WindowManager is viewed as a Producer, sending buffers to SurfaceFlinger which then interacts with HardwareComposer to correctly compose buffers from different producers.

We can even take control of the window’s surface by doing this in our Activity:

getWindow().takeSurface(callback);

If you do this, you’ll see that view hierarchy is not rendered, though views are there and events are captured. Only one client at a time can modify a buffer — a call to lockCanvas() or lockHardwareCanvas() acquires a buffer and creates a Canvas bound to it. When all modifications are done — unlockCanvasAndPost() sends the buffer back to the queue. Usually, all this is handled for us by ViewRootImpl class, which we’ve already mentioned before.

Now, what happens when there’s a SurfaceView in view hierarchy? For each such view, a new surface (i.e. a buffer queue) is requested from a SurfaceFlinger and the new surface is by default logically positioned behind the surface of the containing window. This means that the actual rendering of a SurfaceView content happens independently of View hierarchy rendering and two outputs are composed on a later stage of rendering. All such surfaces are scoped to the parent window.

The problem is that sitting on the producer side of the queue we can’t get access to buffers, nor do we have a generic hook for all SurfaceView implementations (Google Maps, for example, provides a snapshot() method that can be used to get the currently displayed content). Luckily, Android provides some public APIs that might suit our needs.

MediaProjection API (L+)

MediaProjection API was added in Android L and is based on VirtualDisplays, a mechanism that allows us to become buffer consumers. VirtualDisplay makes it possible to receive buffers composited by SurfaceFlinger to a Surface we provide during display creation.

To create a Surface we can use a SurfaceTexture or a class more convenient for this purpose — the ImageReader. ImageReader provides a callback to receive buffer contents wrapped into Image objects containing a raw ByteBuffer and meta-data about its contents.

When we create a VirtualDisplay using MediaProjection — it starts capturing all the user sees on their display (not only our app), and on onImageAvailable() callback fires every time a VSYNC for internal display is performed. Since we care only about one image — a flag indicating the first frame was consumed should be kept to avoid creating unnecessary bitmaps. The callback might be invoked multiple times even if we’ve closed the projection right after the first image was processed.

Even though MediaProjection is a solid solution for taking screenshots, it too has major downsides.

As it’s a very powerful mechanism that allows recording everything that’s happening on a display, that’s no surprise a user’s consent to initiate a capture session is required. When MediaProjection is requested, a system dialog asking for permission is displayed and we are able to create projection only after permission is granted. Pre-Android 10, the dialog had a “Don’t show again” option, but with the newest changes, the consent can’t be remembered. There’s a hack though: we can save the grant result for future uses, minimising the number of permission requests to one-per-process lifetime.

Also, the MediaProjection approach requires a lot of code to set up and has a very complicated flow with a lot of asynchronous methods and many possible points of failure. Screenshots produced by this method are as accurate as they get.

PixelCopy API (N+/O+)

PixelCopy is a not very well known API available for N+ devices that takes a Surface and a Bitmap as arguments and calls native code where the last queued buffer is peeked and rendered to a GL texture which is then copied to the Bitmap.

Starting from Android O, there’s an even more convenient method available: we can pass a window instead of a Surface and get all the surfaces scoped to its ViewRootImpl rendered on a Bitmap.

PixelCopy operates synchronously but this might change in the future, so to receive a callback you have to also provide a Handler to the method.

This approach is very simple:

PixelCopy.request(
   window, 
   bitmapDestination, 
   Runnable { onPixelCopyTaken(it, bitmapDestination) },
   callbackThreadHandler
)

It can be performed on any thread and the only requirement here is to ensure that view hierarchy has been drawn at least once, for which OnDrawListener can be used.

Even though it doesn’t capture dialogs and popups, it’s easy to combine it with the reflection-based approach we discussed above to produce quite accurate screenshots, without disrupting users as when using MediaProjection.

In conclusion

As you can see, programmatically capturing an accurate screenshot is a more complicated task than it may seem at first. There’s no right approach because every method discussed involves compromises and the choice has to be based on functional requirements.

Internally, screenshotty implements all these mechanisms, first trying to make a PixelCopy, falling back to MediaProjection and then to a canvas drawing if something fails. If this doesn’t match your needs, you can use our code as a reference — feel free to copy and modify it to achieve the desired behaviour.

We really hope you enjoyed this article and that it helps the developer community to do the things they do more easily!

And if you’re interested in working with us on Bolt app, check our Careers page.