Raymarching 1: The Basics

8 min readSep 8, 2018

Everyone has seen those realistically rendered demos or videos, and wondered how they are made. The answer is raymarching. Raymarching is a technique for rendering, in real time (!), photo-realistic scenes and objects.

In this article, I will assume that you have a working knowledge of a C-based language such as Java, C++, GLSL, Objective-C, Swift, Python, PHP or C#. Shadertoy experience will help as well.

The way raymarching works is this:

For each pixel, find the direction of the ray going out of it. A ray is basically a line, starting at a single position and going out in a certain direction forever. This is a process, because we want to create a parallax effect, where rays scatter as they get farther from the camera and therefore intersect more with things nearer, less with things farther. Think about it! Your eyes do not look in a single direction, you see things farther to the left on the left of your vision, and things farther to the right to the right of your vision.
We then do what is called “raymarching” the ray.

This is where it gets interesting.

We need a few things, though for getting started:

How close can we get to the scene before we stop moving along the ray? This is generally something like within 0.001 units of the scene.
The above item assumes that we know how far away we are from the scene. This is the main idea: we need a function that takes in a point, and gives the smallest distance to the scene.

The visual I created to understand these ideas is here:

So, what is happening here?

The ray I was talking about is the straight white line. We can see that the ray starts in the center, and goes out in some direction. The scene is in blue.

But what are these circles doing? The ray goes until it intersects the scene, yes, but why do we need circles that seem to just about touch the scene every now and then?

This is the general procedure:

We draw a circle until the circle intersects with the blue, and then that is the biggest circle we can draw without hitting anything.

Then, we draw the ray from the start to the point on the circle where the line intersects. We move the ray that amount of distance. This puts the ray on the circle, in the direction of the ray. So we have moved forward a certain amount, and the ray did not pass through anything!

Repeat, repeat and repeat until we are less than a certain amount from the scene.

The first circle expands until it hits the line above and to the left

The second circle expands until it hits the circle above and to the right

The third circle expands until it hits the circle above and to the right again

The fourth circle expands until it hits the ceiling, when we stop. (The computer goes on for accuracy, but we don’t care).

But how do we know if a circle intersects with the scene? How do we know how big that circle can be made?

Because it is a perfect circle, any point on the edge is exactly r units from the center, if r is the radius of the circle. So we just have to know the radius of the circle at any point, and we have everything figured out.

The radius of the circle from a point to the scene is the same as saying the distance to the scene.

In the scene we have been using so far, it looks like this:

Here, dark is farther away and light is closer.

The distance to the scene describes the size of the biggest circle we can make without intersecting any part of the scene.

Math

You can skip this section, it is a bit boring but helps when you want to come back.

So, how do we make this function that everything depends on?

Say we have an equation of an object:

f(x) = g(x)

Where x is any point.

We can re-arrange the equation to get:

f(x) - g(x) = 0

But because any point x satisfying this is exactly 0 units away from the object, we have:

distance to object = f(x) - g(x)

So, let us look at a circle. A circle has an equation

length(x - center) = radius

Or:

length(x - center) - radius = 0

So:

distance to circle = length(x - center) - radius

If we let x = (x1, x2) as a coordinate, then we can also find the equation of some bounding lines, which will give us:

distance to line 1 = x1
distance to line 2 = x2
distance to line 3 = R1 - x1
distance to line 4 = R2 - x2

where lines 1, 2, 3 and 4 are the bounding lines and R = (R1, R2) is the size of the bounds.

OK, great. So we have the equations for distance to circle and borders. But what if we want more than one object in our scene?

Turns out, we can just say min(distance to object 1, distance to object 2) . This gives us the power to have multiple objects in our scene. I won’t go into it now, but you can also find the distance to a line between points a and b. So the final formula might look like this:

distance to scene =
   min(x1,
   min(x2,
   min(R1 - x1,
   min(R2 - x2,
   min(length(p) - 30,
       length(p - (100, 100)) - 50
   )))))

So we have all the bounding boxes, a circle at the origin with radius 30, and another sphere with center at (100, 100) and radius 50.

Normals

For most of the rest of this post, I will be using GLSL notation for math. If you don’t know these very well, http://www.shaderific.com/glsl-functions/ is an awesome resource for learning and searching the documentation.

So: what is a normal?

A normal is the direction perpendicular to a surface.

Like this:

The ray marches onto the circle, and somehow finds the direction perpendicular to the ray. Magic!

Well, not really. The method uses the idea of gradients, which are calculus and don’t have to make sense now. But basically, it says :

“In which direction are we further away from the the scene? That is the direction the normal goes in.”

The code is like this:

vec2 estimateNormal(vec2 p){
    float xPl=distToScene(vec2(p.x+EPS,p.y));
    float xMi=distToScene(vec2(p.x-EPS,p.y));
    float yPl=distToScene(vec2(p.x,p.y+EPS));
    float yMi=distToScene(vec2(p.x,p.y-EPS));
    float xDiff=xPl-xMi;
    float yDiff=yPl-yMi;
    return normalize(vec2(xDiff,yDiff));
}

The intuitive explanation for this is that if the X+ direction is further away, then xPl > xMi , and xDiff is larger, therefore the normal is more in the X+ direction at the end.

Pretty easy! Here is the shader for 2d raymarching: https://www.shadertoy.com/view/4lKyDD

3D

So this is awesome, and pretty easy as well. But how can this be used to render 3D scenes? 2D scenes are easy.

Because we are now mass-producing rays, it makes sense to have a ray object:

struct ray {
    vec3 pos;
    vec3 dir;
};

So first of all, in 3D we send out one ray per pixel, instead of one ray overall. So we need a function to create that ray:

ray create_camera_ray(vec2 uv, vec3 camPos, vec3 lookAt, float zoom){
    vec3 f = normalize(lookAt - camPos);
    vec3 r = cross(vec3(0.0,1.0,0.0),f);
    vec3 u = cross(f,r);    vec3 c=camPos+f*zoom;
    vec3 i=c+uv.x*r+uv.y*u;
    vec3 dir=i-camPos;
    return ray(camPos,dir);
}

Where uv is generally in the [-1, 1] range.

And our simple distance function, from a sphere and a plane, looks like this:

float distToScene(vec3 p){
    return min(p.y,length(p)-0.3);
}

The plane perpendicular to the y axis and a sphere of size 0.3 at the origin.

Our normal estimation function looks like the 2D function, but with the z-component added:

vec3 estimateNormal(vec3 p){
    float xPl=distToScene(vec3(p.x+EPS,p.y,p.z));
    float xMi=distToScene(vec3(p.x-EPS,p.y,p.z));
    float yPl=distToScene(vec3(p.x,p.y+EPS,p.z));
    float yMi=distToScene(vec3(p.x,p.y-EPS,p.z));
    float zPl=distToScene(vec3(p.x,p.y,p.z+EPS));
    float zMi=distToScene(vec3(p.x,p.y,p.z-EPS));
    float xDiff=xPl-xMi;
    float yDiff=yPl-yMi;
    float zDiff=zPl-zMi;
    return normalize(vec3(xDiff,yDiff,zDiff));
}

Yup! That is it.

Putting it together: Coloring and Shading

All together, with everything, our shadertoy looks like this:

struct ray {
    vec3 pos;
    vec3 dir;
};
//Create the camera ray
ray create_camera_ray(vec2 uv, vec3 camPos, vec3 lookAt, float zoom){
    vec3 f = normalize(lookAt - camPos);
    vec3 r = cross(vec3(0.0,1.0,0.0),f);
    vec3 u = cross(f,r);
    vec3 c=camPos+f*zoom;
    vec3 i=c+uv.x*r+uv.y*u;
    vec3 dir=i-camPos;
    return ray(camPos,normalize(dir));
}
//Distance to scene at point
float distToScene(vec3 p){
    return min(p.z,min(p.x,min(p.y,length(p-vec3(0.3,0.0,0.4))-0.3)));
}
//Estimate normal based on distToScene function
const float EPS=0.001;
vec3 estimateNormal(vec3 p){3
    float xPl=distToScene(vec3(p.x+EPS,p.y,p.z));
    float xMi=distToScene(vec3(p.x-EPS,p.y,p.z));
    float yPl=distToScene(vec3(p.x,p.y+EPS,p.z));
    float yMi=distToScene(vec3(p.x,p.y-EPS,p.z));
    float zPl=distToScene(vec3(p.x,p.y,p.z+EPS));
    float zMi=distToScene(vec3(p.x,p.y,p.z-EPS));
    float xDiff=xPl-xMi;
    float yDiff=yPl-yMi;
    float zDiff=zPl-zMi;
    return normalize(vec3(xDiff,yDiff,zDiff));
}
void mainImage(out vec4 fragColor,in vec2 fragCoord){
    vec2 uv=fragCoord/iResolution.xy;
    uv-=vec2(0.5);//offset, so center of screen is origin
    uv.x*=iResolution.x/iResolution.y;//scale, so there is no rectangular distortion
   
    vec3 camPos=vec3(2.0,1.0,0.5);
    vec3 lookAt=vec3(0.0);
    float zoom=1.0;
    
    ray camRay=create_camera_ray(uv,camPos,lookAt,zoom);
    
    float totalDist=0.0;
    float finalDist=distToScene(camRay.pos);
    int iters=0;
    int maxIters=20;
    for(iters=0;iters<maxIters&&finalDist>0.01;iters++){
        camRay.pos+=finalDist*camRay.dir;
        totalDist+=finalDist;
        finalDist=distToScene(camRay.pos);
    }
    vec3 normal=estimateNormal(camRay.pos);
    
    vec3 lightPos=vec3(2.0,1.0,1.0);
    
    float dotSN=dot(normal,normalize(lightPos-camRay.pos));
    
    fragColor=vec4(0.5+0.5*normal,1.0)*dotSN;
}

The only thing of note in this code is the for loop.

float totalDist=0.0;
float finalDist=distToScene(camRay.pos);
int iters=0;
int maxIters=20;
for(iters=0;iters<maxIters&&finalDist>0.01;iters++){
    camRay.pos+=finalDist*camRay.dir;
    totalDist+=finalDist;
    finalDist=distToScene(camRay.pos);
}

So we initialize all our variables, and then go until we are within 0.01 units of the scene. Then the end position is in camRay.pos .

After doing the raymarching, the code calls estimateNormal(camRay.pos) . This is used to create the coloring shown. The problem with this is that normals are in the range [-1, 1] , while colors are in the range [0, 1] . To fix this, instead of saying something like color = normal , we say color = 0.5 + 0.5 * normal . -1 is mapped to 0, 1 is mapped to 1, and everything inbetween is mapped to everything inbetween. It works. Then everything is multiplied by the diffuse component, dotSN , which is the diffuse component of Phong shading. I did not use specular highlighting for this example.

The final result is here.

The 2D shader (with normals, raymarching and distance function for the background) is available here

In my next article, I will introduce more shapes and techniques such as shadows and multiple lights.

Raymarching 1: The Basics

Math

Normals

3D

Putting it together: Coloring and Shading

Written by JArmstrong