The Mean Median

Median is not a very good choice for time sync because when the values converge closely, small random variations make it less stable.

The most dramatic case is when the central node drifts a little, but not enough to change its position in the median array: it then can drag the whole network with it since everybody uses its value as the new network time.

What we want is something like a mean; something that integrates over a large number of votes. But the mean has its own problem: outliers (either honest or malicious) can skew the network time and/or make it fluctuate significantly.

It would be nice to have something that behaves like a median when the values are far apart, but like a mean when they converge close together.

Enter the “Mean Median” :) I haven’t read about it before, so I will assume I just invented it :)

Here’s how it should be calculated:

1. Find the median of the data set.
2. Cap all the values at some distance from it (in the picture above it’s +-200).
3. Calculate the mean.

You can immediately see that when all the values are outside of the core range it behaves exactly like the median. But once more and more values start to converge and fall within the core range it starts behaving like the mean.

The core range should be big enough to cover any random variations and measurement errors, but no bigger.

There is a similar technique called “winsorizing". The problem is that it uses arbitrary criteria (like Nth percentile) to cap the values.

I believe that for our purposes the mean median will be a lot better.

Now just need to test it.