Software renderer for macOS: animating image by setting pixel colors in memory by hand

Dima TheProgrammer
10 min readAug 10

--

I recently discovered Handmade Hero project. This is the series of Twitch stream recordings where the author, Casey Muratori, creates a game on Windows in C without using any third-party libraries. Well, except for Windows API — but there is just no way around that.

After watching parts 3 & 4 where Casey is rendering animated gradient by setting pixel colors in memory by hand, I got inspired to implement the same thing on macOS. No libraries.

Rendering static image

I was not intimately familiar with Apple’s rendering APIs prior, so I decided to start simple by drawing a static image first. Just draw a window with green background.

After quick and dirty search I realized that I could subclass the default NSView and override its drawRect method. This method would be called once at the start of the application — and I could do drawing there.

Quick search on rendering APIs brought me to Apple’s Core Graphics APIs. Turns out you can create a bitmap image via CGCreateImage — and then ask the OS to draw it on screen by calling CGContextDrawImage . This is all pure CPU rendering. No GPU involved at all. Let’s see what the code could look like:


static const void * getBytePointerCallback(void *info)
{
return (const void *)info;
}

static void releaseBytePointerCallback(void *info, const void *pointer)
{
free(info);
}

// This is our custom view class - which we subclassed from NSView
// We are doing it so that we can override drawRect method - which OS will call when the app starts
@implementation CustomNSView

- (void)drawRect:(NSRect)dirtyRect
{
// We want to render a bitmap image with fixed dimensions
size_t width = 1024;
size_t height = 640;
// Every pixel color in bitmap consists of 4 components,
// 1 byte each: Red, Green, Blue and Alpha.
// In our case we won't really use Alpha, but we will still have 4 bytes per pixel for memory alignment reasons
uint8_t bytesPerPixel = 4;
// How many bytes of memory we need to store the bitmap
size_t bitmapSize = sizeof(uint8_t) * width * bytesPerPixel * height;

// Allocate memory
uint8_t *memory = malloc(bitmapSize);

// Now we need to wrap our raw allocated memory with Apple's specific type
// - so that we can feed it into Core Graphics APIs.
// The wrapper type we need is CGDataProviderRef

// This callbacks struct is required to create the wrapper type.
struct CGDataProviderDirectCallbacks callbacks;
// Must always set version to 0
callbacks.version = 0;
// OS will call getBytePointer callback to get raw pointer to the memory we are wrapping
callbacks.getBytePointer = getBytePointerCallback;
// OS will call releaseBytePointer when it is time to deallocate the memory we are wrapping.
// Note: because we have this callbac, we are not calling free(memory) ourselves directly.
// Instead we put this call to the callback.
callbacks.releaseBytePointer = releaseBytePointerCallback;
// Wrap our raw memory
CGDataProviderRef dataProvider = CGDataProviderCreateDirect(memory, bitmapSize, &callbacks);

// Color space is required for CGImageCreate call below.
// Simply put, this is to tell OS that we want to use RGB colors.
CGColorSpace colorSpace = CGColorSpaceCreateDeviceRGB();
// Getting deeper into the weeds of Core Graphics. Simply wrapping raw memory with CGDataProviderRef was not enough.
// Now we need to wrap CGDataProviderRef with CGImageRef specifying some metadata along the way.
CGImageRef image = CGImageCreate(
width, // bitmap width
height, // bitmap height
8, // bits per component. R, G, B, A - components. Each component is 1 byte = 8 bits
8 * bytesPerPixel, // bits per pixel
width * bytesPerPixel, // bytes per bitmap's row
colorSpace,
// This is how we explain our pixel color structure to OS:
// We have R G B A per pixel. A (alpha) we do not use. This byte is just for memory alignment - and should be skipped
// Specifying kCGImageAlphaNoneSkipLast is how we can tell the above to OS
// Check this link for more info on other supported pixel formats:
// https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/drawingwithquartz2d/dq_context/dq_context.html#//apple_ref/doc/uid/TP30001066-CH203-BCIBHHBB
kCGImageAlphaNoneSkipLast,
// Wrapped memory
dataProvider,
// Whether we want to allow remapping of the image’s color values.
// We are doing everything by hand. We don't want any help from OS
NULL,
// Whether we want OS to help smooth the image. No.
false,
// How to handle colors that are not located within the gamut of the destination color space
// Don't want anything fancy. Using the default value.
kCGRenderingIntentDefault
);

// Get region of memory (wrapped with CGContextRef) associated with the window
CGContextRef ctx = NSGraphicsContext.currentContext.CGContext;
// This is what does actual drawing. Draw bitmap to window.
// dirtyRect has coordinates and dimensions of the window to draw to.
CGContextDrawImage(ctx, dirtyRect, image);

// Clean up memory
CGColorSpaceRelease(colorSpace);
CGDataProviderRelease(dataProvider);
CGImageRelease(image);
}

@end

If you run the app now, you should see something like this:

Rendering raw memory contents

Did you think it was a bug in the code? No! This is correct behavior. Remember that we allocated memory with malloc — which does not initialize allocated memory. After that we are simply displaying whatever garbage was in the allocated chunk. This is what we get. Raw memory visualization. Pretty exciting, eh?

It won’t take long now to amend the code to actually initialize pixel colors to be green to draw that window that we initially wanted to. I will skip this part — leaving it as an optional exercise for an enthusiastic reader.

Rendering animated image

On macOS when we want to render pixels at the refresh rate of a display, we must use CVDisplayLink . You provide a callback, and OS will call it when it is time to draw. For 60Hz display, that would be every 16ms or so.

Let’s take a look at the code:

// Callback to be called every vsync. For 60Hz refresh rate of a screen it is every 16ms or so
static CVReturn renderCallback(
CVDisplayLinkRef displayLink,
// Current time
const CVTimeStamp *inNow,
// Time when the content will be rendered to screen.
const CVTimeStamp *inOutputTime,
CVOptionFlags flagsIn,
CVOptionFlags *flagsOut,
// This is basically a pointer to our view
void *displayLinkContext)
{
// Cast raw pointer to a view and then call a method on it
CVReturn error = [(__bridge CustomNSView *) displayLinkContext displayFrame:inOutputTime];
return error;
}

@implementation CustomNSView

// Constructor
- (id) initWithCoder:(NSCoder *)coder {
self = [super initWithCoder:coder];

CGDirectDisplayID displayID = CGMainDisplayID();
CVReturn error = kCVReturnSuccess;
error = CVDisplayLinkCreateWithCGDisplay(displayID, &displayLink);
if (error)
{
NSLog(@"DisplayLink created with error:%d", error);
displayLink = NULL;
}
CVDisplayLinkSetOutputCallback(displayLink, renderCallback, (__bridge void *)self);
CVDisplayLinkStart(displayLink);

return self;
}

- (CVReturn)displayFrame:(const CVTimeStamp *)inOutputTime {
// all the code to do state update, output audio, do disk IO etc.
// ...

// Setting a setNeedsDisplay flag on a view will force it to re-render.
// Apple forces us to ensure that this flag is set on the main thread,
// but since CVDisplayLink's callback is called on the separate thread,
// we have to do this dispatch_sync thing to ensure that it is the main thread that sets this flag.
// Loosely speaking, we put the callback to the command queue of a main thread to execute
dispatch_sync(dispatch_get_main_queue(), ^{
// If you keep the rendering code in drawRect as shown in previous section,
// it will be called automatically when setNeedsDisplay is set to true
[self setNeedsDisplay:YES];
});
return kCVReturnSuccess;
}

@end

When we call CVDisplayLinkStart , OS will fire up a new thread — and on that thread it will call our renderCallback every vsync i.e. every 16ms or so for a 60Hz screen.

In that callback, notice that we have 2 timestamps— inNow and inOutputTime : inNow = current time, inOutputTime = when it is time to render on screen.

This means that time we have to update the application state, output audio, do disk I/O, update the contents of framebuffer to be displayed = inOutputTime — inNow . If we don’t make it within this time interval, then OS will drop the frame.

Another thought is that with this CVDisplayLink approach, our application is being driven by screen refresh rate — and not vice versa as you might be used to in graphical applications on other platforms.

Analyzing performance

With CVDisplayLink, if we keep the rendering code in drawRect, the app will work. Now it is time to measure performance.

We can do it by remembering value of inNow inside CVDisplayLink rendering callback from the previous frame — and then comparing it with the current value of inNow . The difference will give us frame time, and from frame time we can approximate FPS.

Here is the function to calculate frame time and print it to the console:

static struct mach_timebase_info mti;
static CVDisplayLinkRef displayLink;

static void initTimebaseInfo(void)
{
kern_return_t result;
if ((result = mach_timebase_info(&mti)) != KERN_SUCCESS) printf("Failed to initialize timebase info. Error code: %d", result);
}

// Actual calculation of frame time
static void logFrameTime(const CVTimeStamp *inNow, const CVTimeStamp *inOutputTime)
{
static uint64_t previousNowNs = 0;
uint64_t currentNowNs = inNow->hostTime * mti.numer / mti.denom;
uint64_t inNowDiff = currentNowNs - previousNowNs;
previousNowNs = currentNowNs;
// Divide by 1000000 to convert from ns to ms
NSLog(@"inNow frame time: %f ms", (float)inNowDiff / 1000000);
}

Quite a bit is going on here. Turns out CVTimeStamp measures time in ticks — which are internal to OS. It is not directly connected to seconds — and we as humans want seconds for our frame time.

As you see in the code, conversion to seconds happen with this line of code:

uint64_t currentNowNs = inNow->hostTime * mti.numer / mti.denom;

And the mti thing is just a struct with numerator and denominator values, which we use to convert ticks to nanoseconds. I guess these values differ from platform to platform, and that is why they are not static — they must be initialized when the application starts. For this reason in the code above there is this initTimebaseInfo function — and I call it from CustomNSView constructor.

There is barely any documentation on this in Apple docs, but here is the nice article which goes into a good detail: http://litherum.blogspot.com/2021/05/understanding-cvdisplaylink.html

If we now run the app, we should see the frame times logged to the console. At this point these values are going to be pretty large. Simply setting same color for 1024 x 640 pixels brings the frame time up to 65 ms (15 FPS or so) on my M1 Mac! Horrible.

Improving performance

According to Apple, every view has an underlying CALayer which is responsible for drawing. When we override drawRect method of a view and put rendering code there, the view itself is supposedly managing its associated CALayer for us automatically — but based on our performance measurements, it is not doing it particularly efficiently.

The solution is to create the layer manually by subclassing CALayer — and then associating it with the view.

Here is the code for creating our own CALayer :



static const void * getBytePointerCallback(void *info)
{
return (const void *)info;
}

static void releaseBytePointerCallback(void *info, const void *pointer)
{
free(info);
}

@implementation CustomCALayer

// Constructor
// We moved initialization code here from view's constructor
- (instancetype)init
{
self = [super init];

colorSpace = CGColorSpaceCreateDeviceRGB();
callbacks.version = 0;
callbacks.getBytePointer = getBytePointerCallback;
callbacks.releaseBytePointer = releaseBytePointerCallback;

return self;
}

- (void) dealloc
{
CGColorSpaceRelease(colorSpace);
}

// Actual rendering. Here we moved the code from view's drawRect method
- (void)drawInContext:(CGContextRef)ctx
{
@autoreleasepool
{
// Back in drawRect method we had dirtyRect passed in to us as an argument
// Here we dont have that, and therefore we have to get dirtyRect from OS ourselves
CGRect dirtyRect = CGContextGetClipBoundingBox(ctx);
framebuffer.width = 1024;
framebuffer.height = 640;
size_t bitmapSize = sizeof(uint8_t) * framebuffer.width * 4 * framebuffer.height;

framebuffer.memory = malloc(bitmapSize);

CGDataProviderRef dataProvider = CGDataProviderCreateDirect(framebuffer.memory, bitmapSize, &callbacks);

image = CGImageCreate(framebuffer.width, framebuffer.height, 8, 32, framebuffer.width * 4, colorSpace, kCGImageAlphaNoneSkipLast, dataProvider, NULL, false, kCGRenderingIntentDefault);

CGContextDrawImage(ctx, dirtyRect, image);

CGDataProviderRelease(dataProvider);
CGImageRelease(image);
}
}

@end

Notice that we moved initialization code from view’s constructor to layer’s constructor, and we moved rendering code from view’s drawRect method to layer’s drawInContext method.

Couple of changes here.

One — back in view’s drawRect method we had dirtyRect passed in to us as an argument, but in layer’s drawInContext we don’t have that. So we have to get the dirtyRect from OS ourselves by calling CGContextGetClipBoundingBox function.

Two — back in view’s drawRect we were getting current drawing context with this line of code:

CGContextRef ctx = NSGraphicsContext.currentContext.CGContext;

In layer’s drawInContext, we get this context passed to us as a method argument.

Drawing animated gradient and measuring performance

Finally, we can now render the animated gradient. Do you still remember this was the goal when we started?

Here is the code to do that:

CustomCALayer.m

// This is what actually renders gradient
static void renderGradient(int xOffset, int yOffset)
{
int pitch = framebuffer.width * 4;
uint8_t *row = framebuffer.memory;
for (int y = 0; y < framebuffer.height; y++)
{
uint8_t *pixel = row;
for (int x = 0; x < framebuffer.width; x++)
{
*pixel++ = 0;
*pixel++ = y + yOffset;
*pixel++ = x + xOffset;
pixel++;
}
row += pitch;
}
}

@implementation CustomCALayer

- (instancetype)init
{
// same code as above
}

- (void) dealloc
{
// same code as above
}

- (void)drawInContext:(CGContextRef)ctx
{
@autoreleasepool
{
CGRect dirtyRect = CGContextGetClipBoundingBox(ctx);
framebuffer.width = 1024;
framebuffer.height = 640;
size_t bitmapSize = sizeof(uint8_t) * framebuffer.width * 4 * framebuffer.height;

framebuffer.memory = malloc(bitmapSize);

// Only this block of code is new in this method. The rest is the same.
static uint64_t xOff = 0;
static uint64_t yOff = 0;
renderGradient(xOff++, yOff++);

CGDataProviderRef dataProvider = CGDataProviderCreateDirect(framebuffer.memory, bitmapSize, &callbacks);

image = CGImageCreate(framebuffer.width, framebuffer.height, 8, 32, framebuffer.width * 4, colorSpace, kCGImageAlphaNoneSkipLast, dataProvider, NULL, false, kCGRenderingIntentDefault);

CGContextDrawImage(ctx, dirtyRect, image);

CGDataProviderRelease(dataProvider);
CGImageRelease(image);
}
}

@end

drawInContext is called every frame. There we update the offsets — that is what makes the gradient move — and then pass them to renderGradient function, which actually sets pixel colors in framebuffer.

And now we get the animated gradient. Performance is decent too: I am getting consistent 16ms frame times (60 FPS) on my machine, no frame drops.

If you prefer similar content in video format, you might want to check out my Youtube channel: https://www.youtube.com/channel/UCj433lgrxrl-1GOXKkbWCBQ

--

--