Not All Metrics Are Created Equal

Wilson Wong
Practical AI Coalition
3 min readJun 2, 2020

--

And it is especially true in the case of online search products. In a previous article, we discussed why figuring out and using proper metrics that are aligned with customer outcomes is important. In this article, we explore the traits of metrics which are fit for purpose. I will explain why not all metrics are created the same and describe some of the challenges in coming up with the right metrics to measure and test the performance of online search products.

Photo by Luke Chesser on Unsplash

Pick metrics that are fit for purpose, not what are easy to come by

There is a concept known as measurement inversion by Doug Hubbard, which is used to refer to organisations’ tendency to stick to the things which are immediately measurable. One of the main reason for this is things that really matter can often be perceived as harder to quantify and require thinking and investment from the business. For instance, almost all companies strive for customer satisfaction and loyalty for their products. However, these macro measures are not the easiest to pin down. As a result, an inverse relationship exists between what most people measure versus what matters to them.

Things that really matter are often highly imprecise and thus there is a great opportunity to learn a whole lot more. Whereas, the things that do not really matter can often be easily measured and as a result have been studied to death. [1]

We can appreciate some of the reasons why one might opt for the easy-to-come-by metrics. This however does not make it right. Popular metrics are like your $60 off-the-rack suits from a local warehouse. They should probably work for most men under normal circumstances and only fit a few lucky ones like a glove. There is nothing wrong with off-the-rack suits. However, if your livelihood depends on looking polished, you would probably want to go to a good tailor. A good fit makes sure that the jacket sleeve does not completely cover the shirt’s cuff or the gap between the jacket and the shirt’s collars are not too wide. Similarly, a metric which is fit for purpose is within your control to affect and is aligned with customer outcomes or organisational goals.

Metrics for search quality

The choice of metrics used for search product depends on the customer outcomes we want to improve and the stage of the funnel that we have the power to affect. Using the job seeking process as an example, the metrics can reflect any stages of the funnel from visits and searches for job ads, clicks through to the job detail pages, job applications, short listings and so on.

For many relevance improvement initiatives, the average Web metrics are inadequate. The reason is they often capture things that are not directly aligned our customer outcomes. More importantly, these Web metrics are influenced by so many other factors beyond just the things we want to test. As a result, we can never be certain that the power to affect the metrics is primarily ours. For instance, would the visits metric tell us anything about whether our relevance improvement initiative has generated more job applications? How about the volume of searches? Will we know from it that the candidates or hirers have found the right jobs or profiles without the need to paginate through the SERPs too much?

Conclusion

In order to improve a search product, we need to first get the search specific metrics right. Why? Remember the old mantra of “You can’t manage what you can’t measure”? The reality is, if you are unable to measure correctly, you can never be certain about anything. The use of the wrong metrics either to monitor, manage and improve a product is like a tailor using a thermometer to measure a person for a suit.

The over reliance on off-the-shelf metrics or metrics that are easy to come by can be dangerous. Instead, we need to look back at the customer outcomes that we intend to affect with our relevance changes and design the metrics which are fit for purpose. I am of the opinion that measuring the wrong thing is worse than not measuring at all. This might appear pedantic to some, but I do believe in measuring things that really matter properly.

--

--

Wilson Wong
Practical AI Coalition

I'm a seasoned data x product leader trained in artificial intelligence. I code, write and travel for fun. https://wilsonwong.ai