Optimising design performance

Paul Vella
Aug 23, 2020 · 8 min read
Image for post
Image for post
Creative data remains an untapped reservoir for analysis

In the past decade or so, ‘data’ has emerged from its origins in areas considered logical, numerical and rational like economics, finance, statistics and science; and has been rapidly making inroads into subjects normally based on expertise, experience and evaluation.

Algorithms work behind the scenes to influence relationships in the likes of Facebook and Tinder.

Econometric models are being applied in sports: Moneyball introduced sabermetrics to the general population, and here in Australia it’s been applied to the AFL, especially helping the Sydney Swans win the Grand Final in 2005.

Scientific research has been used to determine the best way to make coffee.

Recommendation models help improve film choices on Netflix and music on Spotify, and machine learning models are being applied to fashion and art.

Even parenting, it seems.

But when it comes to design, data…doesn’t seem to help much.

A designer once asked me for some data on an email they had created, and I easily extracted the standard metrics: open rate, CTR, number of clicks on each element, etc… and passed the data to him. His response has stuck with me to this day:

“That’s great, but what can I do with it?”

That comment made me realise the data we have on hand is useless for creative decisions because it only tells us things about the end result of the creative process. Designers need a source of data to inform them of how their art, copy, video and audio choices have performed. As Jergan Callebaut put it in his article on The Drum, “the creative side of marketing has remained relatively untouched by the industry’s fascination with data-driven practices…because all this time we’ve inadvertently been ignoring the fact that the creative content we produce is also data”.

So I made it my mission to find a way to solve the problem.

Design performance

As a designer, you know what looks good. Your education, experience and personal style accumulate over years into a set of design principles — the ‘laws’ of what you believe makes something look good and fit for purpose. But how do you know they work as effective as you think they do? What data can you use to support your principles when they get challenged by colleagues or clients?

This is where data helps by providing information about design performance. In my view, the purpose of design is to drive action from as many people as possible. Therefore, design needs a way to measure the efficiency of individual design elements: how good they are at converting views into actions (i.e. clicks).

Measuring design performance

Our traditional digital marketing metrics like reach, impressions, pageviews and clicks are heavily influenced by external factors such as media spend and seasonality. A particular ad might work really well because there was a large budget behind it or because it was launched in June when end-of-year sales occur. These metrics don’t purely reflect the performance of design elements (more spend = more impressions = more clicks).

The simplest way I could find to measure efficiency (in display ads) was to flip the click-through formula: impressions/ clicks. This metric now tells us how many impressions it takes to earn a click (impressions per click, or IPC), and the lower the IPC the more efficient something is at converting views into action (takes less views to earn a click). The formula changes a bit depending on the channel, but the essence remains the same.

Image for post
Image for post
Measuring design efficiency across display, website, email and social media

Gathering information about design

Now that I had a way to measure design, I needed a way to store information about the design attributes of creative work.

Since the style or design of creative work is the combination and arrangement of several design elements together, I started cataloging as many design components as I could observe in the work or gather from the creatives that designed the work. These, I felt, should be objective elements that everyone would agree upon to avoid bias in the analysis (i.e. we can agree on what colour is visible but not whether it is ‘warm’ or ‘cool’). These pieces of information are the dimensions I use to split the IPC results to see which design components perform better than others. I group these design components into three levels:

Design attributes: are the basic building blocks of design that creatives choose between (i.e. should the message inform people of something, entertain them with humour or convince them to trust our brand?).

Design elements: are groups of attributes that represent the major decisions creatives need to make (i.e. creatives need to decide on an image’s background, size, props, posture and position of people, expression, etc…), they are not alternatives like the attributes.

Design categories: are the broad top-level groups of elements that represent areas that may or may not exist in the creative work (i.e. audio, animation, video, etc…).

Image for post
Image for post
Mapping out the creative space across design categories, elements and attributes

Measuring design performance — an example

Lets assume we have 5 ads with different sizes, backgrounds and button colours. Assume we get the following results after these ads ran for a few weeks:

Image for post
Image for post
Standard performance results

Normally, we’d calculate click-through (CTR) and find that the first ad performed the best with a 50% CTR. But for a creative, does that mean we now need to make more ads like this? Does this mean this combination of design elements works the best?

By using the information we have about the design elements, we can simply group the results by those ads with each design element, sum the impressions and clicks for the group and calculate the IPC for that design element:

Image for post
Image for post
IPC for blue backgrounds

And simply repeat the process for all the design elements to provide results for each element that are comparable and bias free:

Image for post
Image for post
IPC results for each design element

From these results we can see which combination of elements (red IPCs in the image above) could be optimal for driving action:

  • The landscape size (takes 3 impressions to earn 1 click whereas the portrait size needs 6 impressions)
  • A red button (takes 3 impressions to earn 1 click whereas the green needs almost 5)
  • White background (just slightly better than the blue background)

So even though the first ad had the best CTR, only the landscape sizing is worth replicating in the next set of ads. This ‘new’ combination of design elements from the IPC metrics is still within brand guidelines (as all these elements have been used in other ads) but provides new opportunities to explore the creative space available to do something novel.

Side note: splitting a metric along a number of dimensions isn’t a new thing. Analysts do the same thing whenever we split sales by consumer segments, website traffic by media channels or customer orders by location or time of day. The only difference here is now we are splitting results by design aspects from the creative team.

This doesn’t necessarily mean that the other design elements should be discarded. There might be good reasons for them to remain (i.e. budget). As such, this methodology could be used on a subset of ads (i.e. those with green buttons) to see which design elements perform better within those parameters or within results from a particular segment of individuals.

Is 3.8 really better than 4?

One important thing to note here is that these measures will always be different for every design element, so there will always be one element performing more efficient than others.

That doesn’t necessarily mean it is better.

3.8 (IPC for white backgrounds) is really, really close to 4 (blue backgrounds). If we gathered more data, it is possible that the results would flip, so how can we say with certainty that one design element performs better?

Data analysts have used statistical significance testing for a long time to check if the difference between two numbers is due to the differences in the groups behind the numbers or likely to be chance, and would provide the certainty we seek here. A p-value of 0.05 or less would tell us to reject the null hypothesis (that the difference between 3.8 and 4 is due to chance) and therefore the difference in the two numbers is caused by something (implied to be the design element).

Practical Example

Last year I had the opportunity to test out the methodology with a client that was seeking data-driven creative recommendations. Over a period of 3 months, I gathered data on 164 ads across 3 segments. I cataloged each ad against 22 design elements, some of them being:

  • Number of people in the image
  • Posture of people
  • Gender
  • Eye direction
  • CTA wording
  • Image background

I pulled together the design information and performance results for each ad, then worked out the IPC for each design element per segment. Here are the results for image backgrounds by the three segments:

Image for post
Image for post

Each segment had a different preferred image — the one that needed fewer views to earn a click. And since our reactions to aesthetics are immediate, unconscious and not easily observable, the results give us hints to the mindset of viewers. Perhaps the senior segment are more likely to click on ads with beach backgrounds because it represents their ideal-state? Perhaps the business segment like workplace images because it reminds them of themselves?

We now have opportunities to explore the creative space further:

  • Do seniors just like beach images or are there strong results for other locations that reflect their ideal retirement?
  • What other design elements work best with beach backgrounds? what messaging, persuasion approach (rational or emotive) and tone (focus on past, current situation or future) works best here?
  • Is this finding merely a result of the 3 months we gathered data (Jan-Mar happens to be summer in Australia) or is it more permanent?

Final thoughts

Designers can sometimes feel that data is a ‘creativity killer’ when it’s used to override their decisions of aesthetics and balance. But this methodology should hopefully be useful to validate the design principles they have formed from their expertise and experience by showing how effective their choices have been at converting views into action.

And for that designer that asked me long ago what can they do with the data I gave him, I hope this article provided a better answer than I initially gave.

Sigma 1

A collection of thoughts, opinions and calculations of a data enthusiast

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store