Building a Rendering Engine in Tensorflow
In this post, I’ll walk through the basic ideas behind a rendering engine I built using Tensorflow, with code samples in Python. Much of the ideas and code were inspired/incepted by Íñigo Quílez, Matt Keeter, and the HyperFun project, so thanks to them!
The Geometric Representation: Implicit Surfaces
An implicit surface is a representation of geometry by means of a function (call it
f) on three dimensional space that returns a number whose sign represents whether the input point is inside or outside the geometry. We’ll use the convention that negative signs mean “outside” and positive signs mean “inside”. They’re called “implicit” surfaces because each input point
f(x,y,z) = 0 lies precisely on the surface, and so the surface is implicitly defined by
f’s solution set. In general, there are no other assumptions on
f, meaning it could be defined procedurally (like a fractal), algebraically (like
f = sqrt(x^2 + y^2 + z^2) — 1 for a circle of radius 1), or even stochastically.
Let’s start by considering implicit surfaces defined by algebraic expressions. Our basic primitives will be spheres, tori, planes, ellipsoids, and cylinders. These all have simple algebraic expressions, for instance
plane(x,y,z) = x*a + y*b + z*c + m. In order to implement this in Python, we’re going to use closures to set the values
def plane(a, b, c, m):
def expression(x, y, z):
return x*a + y*b + z*c + m
We’ll build more complicated geometries by means of boolean operations, and coordinate transformations. Here’s how we’ll define the boolean operations (you can go deep into the rabbit hole here with the theory of R-functions, but for now we’ll stick to the simplest possible implementation):
def union(f, g):
# f and g are functions
def unioned(x, y, z):
return max(f(x, y, z), g(x, y, z))
def intersection(f, g):
def intersected(x, y, z):
return min(f(x, y, z), g(x, y, z))
def negated(x, y, z):
return -f(x, y, z)
The coordinate transformations are a little tricky to wrap your head around, but generally they distort the input coordinates to the function. For instance, the simplest transformation is
return (x-a, y-b, z-c)
If we want to translate a function, we compose
f with our transformation and write
f(*translate(a,b,c)(x, y, z)) and the solution set of this new translated function has been translated by
(a,b,c) . It took me a while to get the hang of this, so this bears another look: if
f(0, 0, 0) = 0, then
(0,0,0) is on the surface of
f, and so
f(*translate(1,0,0)(1,0,0)) => f(1-1, 0-0, 0-0) => f(0,0,0) = 0, so the solution set is now higher by
(1,0,0) . This concept can be taken quite far, into the realm of cylindrical coordinate transformations and domain repetitions via use of saw waves (see Johann Korndörfer’s presentation about the crazy things you can draw with implicit surfaces).
Here we see the first evidence of the appeal of implicit surfaces: you can succinctly represent incredibly complex geometries with simple algebraic equations. This lends itself particularly well to modeling geometries with complicated topology, as boundary representations (the industry-standard representation in computer aided design) fall apart in these situations, because you have to keep track of where the different patches meet, and that can get prohibitive for geometries that have microstructures.
I think of implicit surfaces as trading time complexity for space complexity: you don’t have to hold all the topological stitching information in memory anymore, but to pay for it you have to solve a (potentially) huge algebraic equation. In order to define anything interesting with our primitives we’re going to need a large tree of numerical operations, and in order to render anything to the screen we’re going to need to sample our function
f on hundreds of thousands of points to get an idea of approximately where the function is zero.
So here lies the architecture challenge: we have to build our algebraic expressions in such a way that we reuse computation as much as possible, compute with efficient numerical implementations on large arrays of input data, and eventually parallelize the computation.
And that’s where Tensorflow comes in.
Tensorflow: Declarative Graph Computation
Tensorflow is a framework for defining computation graphs in Python and then running them in C++ or CUDA. It’s main application is for defining neural networks, but it was designed to be far more general. It has features we’re going to find invaluable like common subexpression elimination, automatic differentiation, and CUDA compilation.
Tensorflow’s main data structure is the
Tensor, which represents the result of a calculation. In many ways they are designed to resemble
numpy.ndarray, so you can apply most of the intuition about
ndarrays here. If we have two tensors
b, we can create more calculations by applying arithmetic operations or more complicated functions to them. For instance
a + b returns a new
Tensor representing the result of adding the two tensors.
So here’s the plan: we’re going to evaluate our function with
Tensors as input, which will generate our computation graph. Then we’ll automatically reuse as much of our computation as possible through Tensorflow’s common subexpression elimination, and we’ll be able to take exact derivatives of our function via automatic differentiation (this will come in handy for rendering).
There are two common methods of rendering implicit surfaces: polygonization and raytracing. Let’s start by approaching the simpler of the two, polygonization.
To keep things simple, we’re going to use the classic Marching Cubes algorithm to generate triangles given a volumetric grid of function values. In order to generate our volumetric grid, we’re going to define three different tensors, called
z, which represent our coordinates that we can then feed to our function, which will combine these tensors and return a single tensor representing the function values at each point. We’ll use Tensorflow’s
Variable class to initialize our coordinate tensors, though we could also use the runtime
feed_dict option to feed our data in.
This rendering method is fast, simple, and great for debugging geometries. It does have issues when the geometry has features that can’t be captured by a coarse, evenly spaced grid. Raytracing, while generally slower than polygonization, provides essentially infinite resolution and the possibility to render truly beautiful scenes.
With raytracing, we’re going to need an additional assumption on our function
f, and that’s that its magnitude is always less than the Euclidean distance from its input point to the closest point on the surface. In short, we’ll confine ourselves to Signed Distance Functions.
Fundamentally, raytracing consists of turning the expression
f(x,y,z) into an expression in only one variable
t, the ray length. We then iteratively solve
f(*(ray*t)) = 0 for
t, and from that we know where the surface lies. We do this for each pixel on our screen, and we output an image. We need the signed distance property in order to know, at each iteration, how much to increase
We could also use bisection, Newton’s method, or even Tensorflow’s built-in optimization methods to solve the raytracing equation, but none of these methods were both correct and performant when I tried them.
Once we have the ray lengths for each pixel, we can use Tensorflow’s automatic differentiation feature to compute exact numerical derivatives of the function
f to get the surface normal. Note that
df gives the surface normal because the scalar field generated by
f always increases as you move towards the surface, so
df always points towards the interior of the geometry.
Finally, we run the iteration step enough times to get good-enough-ness and render! You could also imagine adding convergence tests to terminate the loop in the case that every pixel is already either converged or part of the background.
The final product, with many different frames put together into a video:
When running on an Amazon GPU instance, it took about a minute to render a frame in 1080p, which is not at all competitive with raytracing via GPU shaders, but now we have the image tensor in Tensorflow if we wanted to backpropagate error.
Bonus: Tensorboard Debugging
Many times, it’s hard to debug a huge computation tree, especially when you’re trying to optimize performance. Tensorflow has a graph visualization tool for precisely this purpose! By decorating each function with a call to
tf.name_scope, we can easily visualize each different conceptual piece of the compute tree.
You can even hijack Tensorflow’s protobuf serialization to make your geometrical models portable. For instance, the full raytracing graph of a model similar to the one shown in the video is here.
Next Time: Symbolic Computation Trees
We’ve seen how we can build an implicit surface renderer in Tensorflow, which helps make rendering difficult models tractable, but there are some drawbacks to this approach. For instance, Tensorflow graphs are hard to modify, and introspect interactively (getting better every day though!). Since our use case will always have neat algebraic expressions, we should leverage that!
In the next post, we’re going to see how the computer algebra system Sympy will help us with these problems and allow us to model more complicated structures that require doing symbolic integration. Long story short: we’ll build our tree in Sympy and then compile that tree to a Tensorflow graph at render time!