Don’t waste cycles with modulo bias!

Ossi Herrala
HowNetWorks
Published in
2 min readNov 2, 2016

TL;DR Use proper function to generate uniform random values (like arc4random_uniform()) instead of fighting against modulo bias yourself.

Today, after reading a blog post about generating random numbers in various programming languages, my brain told me I don’t remember how modulo bias happens and how not to be biased.

I skimmed through two Google hits (here and here, go read them to refresh your knowledge about modulo bias) and the former had this example code to produce uniform random numbers between 0 and n-1:

int x;do {
x = rand();
} while (x >= n);
return x;

rand() function computes pseudo-random integers between 0 and MAX_RAND (in my OS X MAX_RAND is 0x7fffffff, or 2147483647 in decimal).

What happens if I want to do a simple dice roll of six sided die (random value from range 0 to 5)? Above example keeps generating and testing new random values until it hits number smaller than six, discarding all values from range 6 to 2147483647. Probability to find one per iteration is about 0.000000279%. It might take a while to find one.

OpenBSD, famous for it’s high quality source code, offers a function called arc4random_uniform() for generating uniformly distributed random values between 0 and n–1. See the source code and manual page. Here’s snippet of that code:

uint32_t r, min;min = -upper_bound % upper_bound;for (;;) {
/* arc4random() is OpenBSD's rand() on steroids */
r = arc4random();
if (r >= min) break;
}

return r % upper_bound;

Now, there’s a clever trick in that small snippet. For our dice roll example (upper_bound = 6) we can calculate (unsigned 32 bit arithmetic here):

min = -6 % 6 = (2^32 - 6) % 6 = 4

and then in the for loop we get random number and discard only values where arc4random() returns number smaller than min (in this case we discard values 0, 1, 2 and 3). Now the probability to get good hit on first iteration of loop is super high. Awesome.

Now go and stop wasting time with modulo bias like I did this morning :) Or continue further with…

Testing it in practice

Naive way

#include <stdio.h>
#include <stdlib.h>
int uniform(int n) {
int x;
do {
x = rand();
} while (x >= n);
return x;
}
int main() {
int r = uniform(6);
printf("%d\n", r);
}

Ten runs with the above code results average run time of 6.7 seconds. Really expensive dice rolls!

OpenBSD way

#include <stdio.h>
#include <stdlib.h>
uint32_t uniform(uint32_t upper_bound) {
uint32_t r, min;
min = -upper_bound % upper_bound; for (;;) {
r = arc4random();
if (r >= min) break;
}
return r % upper_bound;
}
int main() {
uint32_t r = uniform(6);
printf("%d\n", r);
}

Ten runs with above code completes with average time of 0.00 seconds. Much better.

--

--

Ossi Herrala
HowNetWorks

Co-founder and R&D lead of @sensorfu. Interested about free software, network security, and ham radio (callsign OH8HUB).