Matthew Montrone

Intuiting Micro & Macro variants of Machine Learning Metrics

The Data Reader
3 min readDec 3, 2022

--

Introduction

Demonstrating a clear understanding of micro and macro measures will set you apart as someone who can obtain reliable and actionable learnings from data. Becoming comfortable with the intuition behind micro and macro statistical measures is essential to obtaining more precise insights from your experiments and driving impact.

Average of Averages

Why is the distinction between micro and macro variants of metrics necessary?

It all boils down to a subtle distinction in how you calculate the average from your data. One perspective (micro) is that you do not acknowledge any subgroups present in your data and that individual samples are all considered equally, regardless of whether subgroups are equally represented or not. The alternative perspective (macro), is that you want to acknowledge that subgroups in your data may differ from one another and you wish for the metric to take these differences into account.

Below we consider an example:

Consider that you have been assigned the task of calculating the average height of trees in a field. You dutifully explore the field, measuring each tree with care and noting its species.

After this exercise you are left with the following:

Then you simply sum the values and multiple by the count to obtain your average tree height.

You have calculated the average tree height to be 25.875 meters. You look around the field and think to yourself that while this number is the true average of all the individual trees in this field, it isn’t very representative of all the types of trees in the field. It’s much closer to the heights of the oak trees than the fir trees. You start to worry that people would mistakenly judge the fir trees to be extreme outliers or be taken as something entirely other.

To address your concerns you take an alternative approach to your calculation. You first separate your samples into subgroups, fir and oak trees.

You then calculate the averages of the subgroups individually like so:

Average Height of the Oak Trees
Average Height of the Fir Trees

Having found the subgroup averages, you then calculate the average of the averages:

The average of the subgroup averages

You look at this number and are happy that none of the trees in the field would be judged to be particularly extreme given this average. The number is more representative of the subgroups of trees present in the field.

--

--

The Data Reader

I am a data scientist + endurance sport addict. This space features topics of interest including: data visualisation, machine learning, current events and sport