Mastering Quake 3’s enigmatic fast inverse square root algorithm: Part 1

5 min readMay 19, 2023

Source: https://forums.unrealengine.com/t/old-school-lighting/139401

About a year ago, while browsing on YouTube, I stumbled upon a fascinating video by Nemean that presented the fast inverse square root algorithm. It was amazing to see the unique implementation of an algorithm that had never been seen before, showcasing the ingenuity of id software developers who implemented it into the game Quake III Arena. As Nemean was describing the derivation, he mentioned a “magic number” used in the algorithm, but only explained that it gave the smallest error on average but not where it actually came from. Intrigued, I began my online research, but my efforts proved unsuccessful. In this blog series, I will aim to provide a comprehensive explanation of the fast inverse square root and share a possible improvement I discovered during my search for the ”magic number”.

Introduction

The development of the fast inverse square root algorithm began in the early 1990s and was first utilized in Quake III Arena in 1999. Its advantage over other methods was that it ran significantly faster and had a very low error rate, making it highly reliable.

The code snippet provided below showcases an implementation of the fast inverse square root algorithm from Quake III Arena, which can be found on Github.

float Q_rsqrt(float number) 
{
  long i; 
  float x2, y;
  const float threehalfs = 1.5f;

  x2 = number * 0.5f;
  y = number;
  i = *(long*)&y;                          // evil floating point bit level hacking
  i = 0x5f3759df - (i >> 1);               // what the fuck?
  y = *(float*)&i;
  y = y * (threehalfs - (x2 * y * y));     // 1st iteration
  // y = y * (threehalfs - (x2 * y * y));  // 2nd iteration

  return y;
}

The algorithm implemented in the provided code has three significant parts, as evident from the comments on the right side. Each of these parts will be discussed in a dedicated blog post. This is part one of the series, where we will delve into the “evil floating point bit level hacking” technique used in the implementation.

Motivation

Calculating the inverse square root may seem unnecessary at first, but it plays a crucial role in various computational tasks, such as normalizing vectors used in lighting and shading calculations for 3D graphics, and physics simulations. While the speed difference between the build-in sqrt function in C and the approximation may be negligible for a single calculation, when normalizing hundreds of surface normals, the advantage of this approximation becomes apparent.

Below is a picture that shows a normal vector, which is perpendicular to its surface tangent. To ensure proper lighting calculations, we normalize it by dividing each component of the normal vector by its magnitude.

A normal vector perpendicular to a surfaces tangent

IEEE 754

Floating-point arithmetic is a vital component of computing, with wide-ranging applications in various fields. It enables accurate and precise calculations, essential for scientific research, financial analysis, and other domains, which would be otherwise tedious or impossible. To address many difficulties encountered by other floating-point implementations that made it difficult to import and use reliably, the IEEE Standard for Floating-Point Arithmetic (IEEE 754) was introduced in 1985 by the Institute of Electrical and Electronic Engineers (IEEE) .

However, it is important to note that due to the nature of its design, bit shifting operations are not possible on floating-point numbers, which can pose a challenge for some implementations.

The design of the standard was heavily influenced by the scientific notation, which will be important in the next part of the blog series, when talking about the mathematical representation of a floating-point number. The standard defines how floating-point numbers are represented and operated on, based on their three parts: the sign bit, exponent, and mantissa. It also provides rules for rounding, overflow, underflow, and other aspects of arithmetic including the representation of NaN (Not a Number) and infinite numbers. The subsequent section will provide a brief overview on each of the three parts.

Sign bit (S)

The sign bit distinguishes between positive and negative numbers and is represented by the leftmost bit, also known as the most significant bit (MSB). Typically, a value of 0 in the sign bit represents positive numbers, while a value of 1 represents negative numbers.

Exponential (Exp)

The exponent part in a single-precision floating-point number follows the sign bit and determines the scale or the magnitude of the number using 8 bits. A bias, typically 127, is added to the exponent to allow for the representation of both positive and negative numbers. For example, if the exponent bits are 10101111 (decimal 175) subtracting the bias of 127 yields the actual exponent of 48.

Mantissa (Man)

The mantissa, also known as the significant or fraction part, are the remaining 23 bits after the sign and exponent bits. They represent the significant digits of the number and determine the precision or accuracy of the floating point representation.

For example, if the mantissa bits are 10101111…0, the actual mantissa would be 1.10101111…0 in binary,. However, the leading 1 is implicit and not stored in the floating-point representation. This means that even though only 23 bits appear in memory, the total precision is 24 bits because the first bit will always be one and would otherwise be redundant in memory. This concept closely resembles scientific notation.

The three parts are laid out in memory as follows:

Evil floating point bit level hacking

Those who have experience with low-level programming languages are likely familiar with the concept of type casting. As mentioned earlier, directly shifting the bits of a floating-point variable is not possible due to its design. However, there is a clever workaround to bypass this restriction. Instead of casting the type itself, we can instruct the compiler to treat the memory address as a long, which enables us to manipulate the bits through bit shifting. Below is a code snipped that showcases the limitation and the workaround.

float number = 20.75;     // 0 10000011 01001100000000000000000

long i = (long)number;    // 0-00000000-00000000000000000010100

long j = *(long*)&number; // 0-10000011-01001100000000000000000

Conclusion

In conclusion, we have laid the foundation for our blog series on the fast inverse square root algorithm by exploring the intriguing technique of “evil floating point bit level hacking” in this first part.

As we continue our journey in the upcoming parts of this series, we will delve into other aspects of the algorithm, including its mathematical foundation, implementation considerations, and a possible improvement. We hope that this blog has shed some light on the inner workings of this famous algorithm, and has piqued your curiosity to join us on the rest of our exploration.

Thank you for reading our first blog post, on the fast inverse square root algorithm. We value your feedback and encourage you to share any suggestions or improvements in the comments section. Stay tuned for more informative content on Medium. Happy computing :)