ML Tidbits: Nondeterminism on the GPU
For the last year I’ve been in RL world. There we always make sure to set our seeds for determinism, and make sure we get absolutely reproducible results. E.g. we will make sure to run
np.random.seed(42), and double check that our algorithm is deterministic end to end.
In RL, we usually run algorithms on the CPU because we’re usually data-starved (limited by the speed at which the environment generates data). In image classification, everything is on the GPU, so I wrote a basic algorithm, then tried to set things up for determinism. No such luck — It turns out there’s no easy way to get determinism on the GPU.
For example, doing backprop through a simple reduction operation is nondeterministic.
Running it outputs the following:
This isn’t a bug that will be fixed. There’s a fundamental tradeoff between speed and determinism. From @yaroslavb:
Floating point math is non-associative, and if you want speed, you use intermediate results as soon as they arrive on multiple cores, so you get slightly different answers, which get blown up later
When you’re using GPUs, you can’t get determinism without making your code 10x slower, defeating the whole point of GPUs.
So I’m giving up on determinism for image classification now. Best alternative so far: Run multiple seeds of the algorithm and overlay them: