# Averishing

*[very accessible math] [made-up words][floating concept without clear immediate applications]*

*[tldr: This post: defines an “averish” to be a number that’s kinda representative of a pile of numbers; explores exactly what makes a good averish; and notes that, though people default to using the mean, different situations call for different averishes.]*

*[This is closely related to, but more general than, a **measure of central tendency.**]*

My current work-project involves experimenting with “MBFL”; it doesn’t matter what that is (hint: not a Myers-Briggs type), but there’s a step where the computer has a pile of numbers, and it needs to select a number that’s kinda “representative” of the pile. If the pile is all big numbers, the representative is big; if the pile is all small numbers, the representative is small.

Let’s call this the “averish” of the pile: it’s like an average. Taking the mean is one way to find an averish. Same with the max, or min, or median, or any other percentile. Or the mode, or any *p*-norm, or the geometric or harmonic mean, or the root-mean-square.

What, exactly, are the criteria an “averishing function” has to have, to make sense at all? How about:

**Takes a bunch of numbers (order doesn’t matter, duplicates allowed).**

(A legit mathematical term for this is a “bag.” I love this for being a rare case where the term truly captures the intuition of the data structure. It’s a bag: of course its contents aren’t in any kind of order. Of course you can have multiple identical things in the bag.)**Returns a number between the min and max.**The averish should never be outside the range of input. That’d be crazy.**Adding a number to the bag will never push the averish away from that number.**

Interestingly, with the above requirements alone, if you combine two bags, the new averish is *not* necessarily between the averishes of the two bags. (Consider the mode-averishes of *{0,0,1,1,1}*, *{0,0,2,2,2}*, and their union.) This would be a *stricter* requirement than (3), and it would disqualify “mode.” Let’s call averishing functions that have this stronger property “partition-bounded averishing functions,” because if we partition a bag into smaller bags, the big bag’s averish must be between some two small bags’.

Life-changing, this is not. But within the next year, I bet you’ll need to give somebody a summary statistic describing the magnitude of a pile of numbers. Instead of defaulting to the mean, take five seconds to think — are you interested in typical behavior? Maybe use the median. Worst-case? Maybe 99th percentile. Are small values really important and they should be weighted strongly? Consider the harmonic mean. Choose an averish that *reflects what you care about*.