High performance drawing on iOS — Part 2

Besher Al Maleh
7 min readJan 19, 2019

--

Credit: Sharon Pittaway

In Part 1 I discussed two different ways to perform 2D drawing on iOS, both of which are CPU-based. The first one performed ok on the iPhone but very poorly on the iPad (17 fps average), while the second one delivered great performance and only incurred about 25% CPU utilization on average.

I mentioned at the end of Part 1 that I wasn’t very happy with the single-threaded nature of my final result. In this post, I will cover two different ways to draw that leverage the parallel nature of the GPU.

Core Graphics uses the CPU for rendering, while Core Animation uses the GPU. The first two techniques I covered in Part 1 used Core Graphics extensively (hence the high CPU utilization). Now in order to leverage the GPU, we are going to use Core Animation instead, which is responsible for all CALayer-based classes.

The two techniques I am going to discuss are similar in terms of performance. I encourage you to check both out, and choose the one that makes more sense to you. And of course, if you want to share a different way to draw, don’t hesitate to post in the comments or on Twitter!

I uploaded an app to github that demonstrates all the techniques discussed in this series. And at the bottom of this post, I posted gifs showing the incredible gains from using GPU-based drawing.

Sublayer GPU-based drawing

This technique is centered around the use of UIBezierPaths and CALayers to draw and store the drawings within a UIImageView. In touchesBegan, we store the new touch point in a property like this:

Then in touchesMoved, we call a drawing function (drawBezier) to which we pass the previous and new touch points:

Now let’s look at the implementation for drawBezier:

  1. Create a new drawing layer if one doesn’t already exist, to be used as the ‘parent’ layer. This will be assigned to the drawingLayer property
  2. This CAShapeLayer will be added as a sublayer to the drawing layer
  3. Set the scale to match the device scale (2x, 3x, etc). This is really important, since the default scale for newly created CALayers is 1, and that will give you pixellated drawing on 2x and 3x devices (virtually all devices right now)
  4. If we exceed 400 sublayers, we flatten to improve performance

Here’s the setupDrawingLayerIfNeeded method used above:

And finally, here is the implementation for flattenToImage():

First, we get the previously flattened image (if one exists), and we merge it with the newly added drawings/layers. Then we generate a new image from the merged result, and assign it to our UIImageView’s image property.

This flattening process is not ideal, as it relies too heavily on the CPU for my taste. Here’s what the final class would look like.

The frame rate here is pretty stable while drawing, but let’s look at the CPU utilization:

Not bad

As you can see, this is a good improvement over the 25% we got in Part 1. There are tiny, recurrent spikes to 15% as the flattenImage method gets called. Ideally, I want to get rid of that CGContext in the flattening phase, and rely exclusively on Core Animation. This brings us to my next technique.

draw(_layer:ctx:) GPU-based drawing

This approach is somewhat similar to the draw(rect:) approach we did in Part 1, but instead of draw(rect:), we will be working with draw(_layer:ctx:).

We start with a UIView subclass; in touchesMoved, we store the new touch locations in a line array, and then we call layer.setNeedsDisplay(rect:) to perform the drawing (note that we are calling setNeedsDisplay on the layer this time, and not on the view itself.)

We are carrying over the optimizations from Part 1. Namely, we are calling layer.setNeedsDisplay(rect:) to only update the dirty area of the view, and we are flattening the image/emptying the array once we reach a certain number of points.

Here’s what touchesMoved looks like:

Next up, we override draw(layer:ctx:) as discussed earlier to leverage the GPU. Inside of that method, we need to create a CAShapeLayer (a CALayer subclass) to perform the actual drawing. We will not use the CGContext that was passed in by the method signature, because we don’t want to rely on the CPU for rendering.

Our method will look like this:

Here are the important points as numbered above:

  1. Reuse the existing CAShapeLayer to perform drawing, or create a new one if it doesn’t exist
  2. Match device scale to avoid pixellation
  3. Create a UIBezierPath that will guide the drawing
  4. Loop through the existing points in our line array, and build the path accordingly
  5. Now that the path has been stroked and assigned to the shape layer, we assign the layer to a property, and add it as a sublayer to the main view’s layer. This is only performed once (per flatten)

The last thing left to discuss with this approach is the flattening process, which is necessary to prevent performance deterioration over time as you add more points. We add a property observer as we did before:

That in turn calls this method:

Notice how we’re flattening after collecting only 25 points this time. This is because we no longer use a CGContext in the flattening process, and since we are no longer CPU-bound, we can do this more often (every 25 points). Here’s the flattening method:

This may look a bit cryptic at first, and it’s because we’re trying to copy a CALayer, which is not an intuitive process (much like copying a UIView). Copying a layer this way lets us bypass the requirement to use a CGContext, therefore we no longer need to render the existing layers into a context then generate a bitmap image out of the context. This saves us many CPU cycles.

To summarize the steps in the code:

  1. Access the optional drawingLayer, which contains everything we drew earlier in draw(layer:ctx:)
  2. Encode the layers from that drawingLayer into a data object, then decode that object into a brand-new layer (a copy)
  3. Access the optional value of that brand-new layer
  4. Add that new layer as a sublayer on the view’s layer to display it

Now that the drawing layers are safely copied to a separate layer, and the points array has been emptied, we can continue drawing new points without losing our existing drawing.

And if you ever want to clear the existing drawing, you can call this function:

That will safely loop through all the sublayers that are of type CAShapeLayer, and remove them from the main view’s layer. (‘for case let x as’ is one of my favourite patterns in Swift 😄)

Here is the link to what the full class looks like if you want to check it out.

The frame rate on my 11" iPad Pro was a steady 120 FPS while drawing with this approach. And here’s the CPU utilization:

Looking good!

I think 10–11% is a pretty good improvement from the 25% we were seeing in Part 1! Notice how we no longer get spikes to 15% as we did in the first technique discussed above.

Summary

Throughout these two articles, I covered the following techniques:

  • Low performance CPU-based drawing
  • High performance CPU-based drawing
  • Sublayer GPU-based drawing (what I ended up using in my game)
  • Draw(layer:ctx:) GPU-based drawing

The final image that gets drawn is identical amongst the four. In terms of performance on an iPad, the first technique is far too slow, averaging below 20FPS. But the other 3 will all give you a smooth drawing experience (at the max supported frame rate for your device).

Things will start getting trickier if you have other stuff running in the background, or if your app contains multiple drawing canvases. That was the case in my game, which lets 4 players play on the same device, meaning you can have up to 8 drawing canvases simultaneously:

Draw with Math

In those cases, it can be worth the effort to try and minimize the CPU utilization of your drawing to get better performance. Your users will also get extra battery life as an added bonus.

As mentioned in Part 1, I made this small app to demonstrate the techniques discussed in this series. If you build and run the app, you’ll see two buttons that will draw a spiral for you. The left button (Spiral Link), uses a CADisplayLink to draw, meaning it gets called once every frame, while the second button (Max Speed) uses a Timer with an incredibly small interval (0.00001 second), this way it will basically draw as fast as the device will allow it to.

The ‘Max Speed’ button clearly demonstrates the difference in performance between the CPU-based and GPU-based approaches. I am going to leave you with 4 gifs showing what happens after I tapped on it with each technique:

Slow CPU-based
Fast CPU-based
Sublayer GPU-based
draw(layer:ctx:) GPU-based

Thanks for reading. If you enjoyed this article, feel free to hit that clap button 👏 to help others find it. If you *really* enjoyed it, you can clap up to 50 times 😃

--

--