ML Tidbits: Nondeterminism on the GPU

For the last year I’ve been in RL world. There we always make sure to set our seeds for determinism, and make sure we get absolutely reproducible results. E.g. we will make sure to run np.random.seed(42), and double check that our algorithm is deterministic end to end.

In RL, we usually run algorithms on the CPU because we’re usually data-starved (limited by the speed at which the environment generates data). In image classification, everything is on the GPU, so I wrote a basic algorithm, then tried to set things up for determinism. No such luck — It turns out there’s no easy way to get determinism on the GPU.

For example, doing backprop through a simple reduction operation is nondeterministic.

Running it outputs the following:

CPU (deterministic)
23.066511
23.066511
23.066511
23.066511
GPU (nondeterministic)
23.066513
23.066511
23.066509
23.066513

This isn’t a bug that will be fixed. There’s a fundamental tradeoff between speed and determinism. From @yaroslavb:

Floating point math is non-associative, and if you want speed, you use intermediate results as soon as they arrive on multiple cores, so you get slightly different answers, which get blown up later

When you’re using GPUs, you can’t get determinism without making your code 10x slower, defeating the whole point of GPUs.

So I’m giving up on determinism for image classification now. Best alternative so far: Run multiple seeds of the algorithm and overlay them:

Various algorithms on multiple seeds