Evaluating randomized algorithms on randomized problem instances

Sam Ganzfried
Ganzfried Gleans
Published in
2 min readMar 15, 2022

Suppose we want to compare the performances of two algorithms — call them A and B — on a problem X. In particular, suppose that we want to evaluate the algorithms on random instances of X drawn from some distribution D. For simplicity assume that Algorithm A is deterministic and that Algorithm B is randomized. Assume that in each run of Algorithm B a random variable Y is selected from a distribution Q.

The goal is to determine whether algorithm A or B performs better. Specifically, we would like to create a 95% confidence interval that separates their performance.

The obvious approach would be to repeatedly do the following: for i = 1,..,T, sample y from Y according to Q and x from X according to D. Then run both algorithm A and algorithm B (using Y = y) on problem instance x. We can then compute the mean and standard error of the performances over these T samples to determine the average performances of the algorithms.

Now, suppose that it is costly to sample y from Q and that it would take too long to run enough experiments to obtain statistical significance. We could obtain a significant reduction in running time if we test multiple problem instances for each y sample. For example, we could run B using Y = y1 for x1,…,x100, then Y = y2 for x101…x200, etc. In this case the samples for running B on X would not be iid, and we could not conclude the standard error from the sample. However, the samples would still be unbiased and we could potentially obtain a much larger sample size for these experiments than if we need to sample a new y each time.

It doesn’t seem clear to me at all what the appropriate experimental design should be in this case. If we want to sample a new value for Y for each problem instance, the experiments may take too long for us to obtain a meaningful sample size. If we repeat values of y for some number of problem instances then we can obtain a much larger sample size, but at the expense of knowing the variance.

Is there any way to estimate the variance of the second approach despite not having iid samples? And if so, is there a way to determine the optimal number of samples of X to test for each sample of Y?

These seem like pretty fundamental questions to be addressed for evaluating the performance of randomized algorithms on randomized problem instances that I don’t know the answer to. (I am assuming that algorithm A is deterministic in this example, but often A can be randomized too).

--

--