Averishing

[very accessible math] [made-up words]
[floating concept without clear immediate applications]

[tldr: This post: defines an “averish” to be a number that’s kinda representative of a pile of numbers; explores exactly what makes a good averish; and notes that, though people default to using the mean, different situations call for different averishes.]

[This is closely related to, but more general than, a measure of central tendency.]


My current work-project involves experimenting with “MBFL”; it doesn’t matter what that is (hint: not a Myers-Briggs type), but there’s a step where the computer has a pile of numbers, and it needs to select a number that’s kinda “representative” of the pile. If the pile is all big numbers, the representative is big; if the pile is all small numbers, the representative is small.

Let’s call this the “averish” of the pile: it’s like an average. Taking the mean is one way to find an averish. Same with the max, or min, or median, or any other percentile. Or the mode, or any p-norm, or the geometric or harmonic mean, or the root-mean-square.

What, exactly, are the criteria an “averishing function” has to have, to make sense at all? How about:

  1. Takes a bunch of numbers (order doesn’t matter, duplicates allowed).
    (A legit mathematical term for this is a “bag.” I love this for being a rare case where the term truly captures the intuition of the data structure. It’s a bag: of course its contents aren’t in any kind of order. Of course you can have multiple identical things in the bag.)
  2. Returns a number between the min and max.
    The averish should never be outside the range of input. That’d be crazy.
  3. Adding a number to the bag will never push the averish away from that number.

Interestingly, with the above requirements alone, if you combine two bags, the new averish is not necessarily between the averishes of the two bags. (Consider the mode-averishes of {0,0,1,1,1}, {0,0,2,2,2}, and their union.) This would be a stricter requirement than (3), and it would disqualify “mode.” Let’s call averishing functions that have this stronger property “partition-bounded averishing functions,” because if we partition a bag into smaller bags, the big bag’s averish must be between some two small bags’.


Life-changing, this is not. But within the next year, I bet you’ll need to give somebody a summary statistic describing the magnitude of a pile of numbers. Instead of defaulting to the mean, take five seconds to think — are you interested in typical behavior? Maybe use the median. Worst-case? Maybe 99th percentile. Are small values really important and they should be weighted strongly? Consider the harmonic mean. Choose an averish that reflects what you care about.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.