The Fast Inverse Square Root
Imagine you’re a new engineer in a company, trying to understand the codebase. You keep seeing a call to a function that computes the inverse square root:
You decide to look into how this function works. What you see shocks you…
The above isn’t some random code I made up. It is an exact copy of code to compute the inverse square root used in the video game “Quake 3 Arena” from 2005. Yes, that’s right, those comments were there too.
Now, I know some of y’all are less familiar with computer science, so let me start by explaining what appears to be happening, to help you fully appreciate just how absolutely ridiculous this code is.
Line 2 defines a function to compute the inverse square root of an input number, x, which is stored as a floating point number. If you don’t know, a floating point number is essentially a number stored as if it were written in scientific notation. So, say you want to store the number X as a 16 bit floating point number. Roughly, you would convert X into the form:
You would then store the number into 16 bits as follows:
Now, you don’t need to understand the details here. Just realize that this is VERY DIFFERENT than storing a number as an integer. If you store X as an integer instead of a floating point number, you just store X in binary.
For instance, say you want to store the number 25. Below are the two representations of this number. Note that they have NOTHING to do with each other.
Okay — now that you have a brief primer on number storage, let’s go to the next line in the code. Line 3 just computes half of X and stores it in the variable “xhalf”. Alright, so far so good!
But now, line 5. This is where it starts to get crazy.
What this line of code does it take X (a floating point number), and just interprets its bits as an integer, and stores that in the variable ‘i’. Realize that this makes NO SENSE (remember, the way a floating point is stored is totally different than how an integer is stored). Take the 25 case above. If X = 25, you would get i = 1103626240. Wtf?
Now you see why there is this comment “evil floating point bit level hacking”. But why would someone write this and do this to us?? Well, my friend, it gets worse. Let’s go on to line 6.
0x5f3759df is a hexadecimal number, aka just a constant. In decimal form, it is 1,597,463,007. The notation (i >> 1) is shifting the bits of i by 1. (For instance, if you have the binary number “01001”, this number shifted right by 1 is “00100”).
How the fuck does subtracting a right-shifted version of i (which, remember, comes out of some evil floating point number hacking) from 1,597,463,007 help you compute the inverse square root? Who fucking knows. But, we’re not done yet.
This wonderful function finishes by taking the random number we computed in line 6 (which is stored as an integer), and then re-interpreting that integer as a floating point number in line 8. This makes as little sense as line 5 — why are we suddenly interpreting this as a floating point number again, after we did computations on it as if it were an integer????
Then, what the fuck is happening in line 9? What is this a first iteration of? And why are we computing this? Remember that xhalf is not actually half of x anymore, because x has gone through evil floating point hacks and subtraction from random fucking constants and bit rotations and then more evil floating point hacks.
We see another iteration of line 9 in line 10, but it is commented out. Why is it still there if it is commented out? What is it a second iteration of?
Finally, what did any of the above have to do with computing an inverse square root?
The explanation
Okay, so what is this crazy code which actually went live in a video game doing? Is there a reason for all this? As it turns out, there is.
A really common problem in computer graphics is needing to compute a normalized vector. So, if you have a vector (x,y,z), to normalize it, you compute:
Now, computing the sum of squares is easy and fast, but what is NOT normally very fast is computing the inverse square root. But, this is a problem, because if you want your video game to be fast, you need this computation to be fast. How do you do that? Well, in comes the fast inverse square root :)
The high level overview of how it works:
- Doing the evil floating point hack to consider the floating point representation of X as an integer is actually a way to approximate log_2(x). Specifically, if I_x is the integer interpretation of x (remember, x is floating point), then some fancy math tells you that log_2(x) can be approximated with the below equation. L, B, and the greek letter are all constants.
2. Then, using the identity below, given we now have an approximation for log_2(x), we can compute an approximation of log_2(1/sqrt(x)).
A neat trick in binary is that you can multiply something by 1/2 by simply shifting the digits by 1 to the right. That’s where the digit shift comes in. So, if we simply shift the entire equation from step 1 to the right, and multiply it by a negative number, what we end up with is an equation of the form:
Why do we end up with the above? Because a constant shifted right is still a constant, and in our case that constant is the random hexadecimal number we saw in the code.
3. By re-interpreting this new value as an integer again, we “undo” the logarithm, and since at this point our value is approximately log of 1/sqrt(x), by doing the floating point reverse hack, we approximate 1/sqrt(x).
4. Finally, we do the final computation in the code which has the comment “1st iteration”. This is simply an iteration of Newton’s method, which is a numerical method used to iteratively get closer to the correct value in an approximation. Because we have a good approximation of 1/sqrt(x) at this point, we can do an iteration of Newton’s method to improve our approximation.
You could do more iterations of Newton’s method, but it turns out one iteration was enough accuracy for this video game. So, they commented out the second iteration :)
Want a more detailed explanation?
I tried to give a layman’s explanation above, but if you want to actually explore the math, check out the wikipedia article about this, which actually goes in detail about what is happening here. It’s pretty interesting.