Calculating Median Without Sorting

NxtChg
2 min readMay 2, 2017

--

There are two well-known ways to calculate median:

1. naive way (sort, pick the middle)
2. using quickselect (or similar algorithm for weighted median)

Both these ways are not very suitable for our time sync:
- they require a copy of the original nodes array
- they are both pretty slow

It is reasonable to expect that time votes from our nodes will follow a normal distribution. Indeed, computers usually do.

We can exploit this fact and write a linear algorithm, which works in O(n).

Here’s how:

  1. find the mean
  2. set current partition point to it
  3. repeat the following steps until the median is found:
  • go through array and count weights of all the elements above, below or equal to the partition point
  • at the same time find the closest item above and the closest item below the partition point
  • if we found the median, return it, otherwise set the partition point to the item above/below and try again

You get a simple and elegant function, which usually finishes in 3 iterations.

Here’s the code (includes mean and mean median):

/*
These functions calculate weighted mean, median and mean median of an array. Your objects must have mm_value(int no) and mm_weight() functions, which return the same type.
*/
template<class thing, class object>
thing stat_mean(array<object> &data, thing center = 0, thing range = 0, int no = 0)
{
double mean = 0, weight = 0;
const thing lo = center — range;
const thing hi = center + range;
till(data.size())
{
thing w = data[i].mm_weight( ); if(w <= 0) continue;
thing val = data[i].mm_value(no);

if(range > 0) val = clamp(val, lo, hi);
mean += double(val) * w; weight += w;
}
return (thing)(weight ? mean / weight : 0);
}//_________________________________________________________________
/*
This function works best when the median is close to the mean, then it works in O(n). Usually this is the case, so we do it this way to avoid sorting or changing the original array. Worst case performance can be ~O(n²/2).
*/
template<class thing, class object>
thing stat_median(array<object> &data, int no = 0)
{
thing below, above, equal = stat_mean<thing>(data, 0, 0, no);
while(true)
{
double below_w = 0, equal_w = 0, above_w = 0;
till(data.size())
{
thing w = data[i].mm_weight( ); if(w <= 0) continue;
thing val = data[i].mm_value(no);
if(val > equal)
{
if(above_w == 0 || val < above) above = val;

above_w += w;
}
else if(val < equal)
{
if(below_w == 0 || val > below) below = val;

below_w += w;
}
else equal_w += w;
}
double mid = (below_w + equal_w + above_w);

if(mid > 0) mid /= 2; else return 0;
if(mid >= below_w)
{
if(mid <= below_w + equal_w)
return (mid == below_w ? below : equal);
equal = above;
}
else
equal = below;
}
return 0;
}//_________________________________________________________________
template<class thing, class object>
thing stat_mean_median(array<object> &data, thing range, int no = 0)
{
thing median = stat_median<thing>(data, no);
return stat_mean<thing>(data, median, range, no);
}

Everything in just 63 lines of code :)

--

--