Adelta: Automatic Differentiation for Discontinuous Programs — Part 2: Introduction to the DSL

Yuting Yang
8 min readMay 6, 2022

--

[Adelta: Automatic Differentiation for Discontinuous Programs — Part 1: Math Basics]

[Adelta Tutorial — Part 1: Differentiating a Simple Shader Program]

[Adelta Tutorial — Part 2: Raymarching Primitive]

[Adelta Tutorial — Part 3: Animating the SIGGRAPH logo]

[Adelta Tutorial — Part 4: Animating the Celtic Knot]

In the previous post, we introduced a set of new gradient rules presented in Aδ, that allows automatic differentiation for programs with discontinuities. The Aδ automatic differentiation framework is built based on a domain specific language (DSL). One of the key advantages of Aδ is the broad range of programs the DSL can express. In this post, we will showcase types of programs expressible by Aδ and compare with two state-of-the-art differentiable frameworks: Teg² and Diffvg³.

DSL syntax for Aδ

The DSL includes basic operators such as Heaviside step function, addition, multiplication, and atomic function composition. Formally, the minimum DSL can be expressed using the Backus-Naur form as below:

DSL syntax for Aδ

Here, C represents constant scalar value, x represents any variables associated with the sampling axis (a concept introduced in the previous post), θ represents any parameters we want to differentiate wrt, and f represents continuous atomic functions supported in the DSL (sin, cos, exp, log, and pow with constant exponent).

Note this is the minimum set of operations for the DSL, and other basic operations can be easily created through function composition. For example, g - h = g + (-1) ⋅ h and g / h = g ⋅ (h)⁻¹.

The DSL is expressible enough to create complex programs that might be seen in real life, such as the procedural shader programs that are capable of rendering the imagery shown below. The expressiveness is particularly important for the DSL because in general, programmers may construct equivalent discontinuous functions into different programs, like H(x) and H(x) ⋅ H(x). The Aδ framework benefits from the design choice of a 1D box pre-filtering kernel, so that we do not have to analytically solve for the exact discontinuous locations (see the previous post for details). Therefore our DSL allows arbitrary compositions between the Heaviside step function and atomic functions.

Images generated by programs written in Aδ’s DSL

Introduction to Other State-of-the-art Differentiable Frameworks

Teg² is a framework that systematically differentiates parametric discontinuities. Their system is provably correct for the discontinuous functions expressible in their DSL. Teg analytically computes the exact discontinuity location and derives the analytic gradient based on it. The process of locating the discontinuity involves inverting the parametric function that drives the discontinuity (i.e. the input argument to the Heaviside step function), which is generally infeasible. Therefore their DSL only handles a limited set of functions that are differentiable and invertible. This excludes compositions of discontinuities and non-invertible functions. The figure below shows two example programs expressible by Teg. The first example is image triangulation, with discontinuities always modeled by affine transformations. The discontinuity in the second example is modeled by bilinear interpolation, and requires an additional Euclidean to hyperbola coordinate transformation before differentiation.

Images generated by programs written in Teg’s DSL. While visually interesting, the discontinuities in these examples are modeled by relatively simple functions.

Diffvg³ is a framework for differentiable rendering to 2D vector graphics. Technically this is an application-specific framework, and most of its expressiveness comes from the massive number of parameters rather than the complex program structure. But because its implementation includes a variety of 2D primitives and gives users some freedom of constructing their own scene, we will also briefly discuss the class of functions that Diffvg can be applied to.

Differentiable path tracing is also an active line of research. State-of-the-art frameworks³⁴⁵ have derived correct analytic derivatives and efficiently carry out the back-propagation. Because these pipelines focus on a specific application (path tracer with triangle mesh) and have little adaptability to other programs, we will not discuss them here.

Examples of Expressible Functions

We will use a few examples to compare Aδ’s expressiveness with Teg and Diffvg. Note for clear demonstration, the code snippet shown in this post are just pseudocode and cannot be directly called in any of the frameworks. The original code is linked under the description of each pseudocode example. In the next post, we will discuss authoring real code using Aδ’s framework. Also, note in the pseudocode below, we use if/else branches instead of Heaviside step functions for code readability.

Circle: expressible by Aδ, Teg (with manual effort), and Diffvg

We can start with the simple shader that draws a circle:

Rendering Example for the Circle Shader

In Aδ, we can straightforwardly express the program in its Euclidean coordinates:

Pseudocode: circle shader expressed by Aδ, adapted from https://github.com/yyuting/Adelta/blob/main/apps/render_test_finite_diff_circle.py

While Teg allows expressing the circle shader similarly to Aδ in a forward pass, it is unable to directly differentiate the program because the circle expression is not invertible. Instead, the Teg paper suggests taking an additional Euclidean to polar coordinate remapping before the program can be differentiated by the framework. This has an obvious disadvantage: additional manual effort is needed to choose the appropriate transformation, which may not even exist.

Pseudocode: circle shader expressed by Teg, note the extra coordinate remapping needed, adapted from https://github.com/ChezJrk/Teg/blob/master/tests/render_test.py

Unlike Aδ and Teg, Diffvg draws the circle primitive through a given API, where the backend renderer takes care of the rendering and differentiation. While easy to use, Diffvg has limited flexibility as it relies on an existing API to provide the primitives and their derivatives, so each new shape has to be implemented by the developer of the Diffvg framework.

Pseudocode: circle shader expressed by Diffvg through API call, adapted from https://github.com/BachiLi/diffvg/blob/master/apps/single_circle.py

Non-axis-aligned ellipse: expressible by Aδ

Our second example draws a non-axis-aligned ellipse. In addition to parameters that control the position and radius of the ellipse, there are parameters to control the direction of the major axis, and the ratio between major and minor axes.

Rendering example for non-axis-aligned ellipse

Aδ can easily express this program using its quadratic equation:

Pseudocode: ellipse shader expressed by Aδ, adapted from https://github.com/yyuting/Adelta/blob/main/apps/render_test_finite_diff_ellipse.py

Similar to the circle example, while Teg can express this in the forward pass, it is not able to directly differentiate the program. There are several possible workarounds to map this ellipse program to a space that can be handled by the Teg differentiation system, but they all require manual effort and cannot be conveniently expressed in their framework. For example, one may first apply a rotation and scaling to the Euclidean coordinate to map the ellipse to a circle, followed by a Euclidean to Polar coordinate transformation. It is also possible to manually break the ellipse expression into multiple piecewise invertible functions and inform Teg about the inversion. But these all require extra expertise and are somehow against the idea of being “automatic”.

Diffvg cannot be used to conveniently express this ellipse example either. It has an API for drawing an axis-aligned ellipse, but lacks the flexibility to extend to a non-axis-aligned case. It is still possible to use Diffvg to approximate arbitrary geometry using polygonal or spline shapes, with the cost of a massive number of parameters.

Circles with Parameterized Z-order: expressible by Aδ

Parameterizing the Z-order is important for reconstructing some complex interlocking patterns, such as the rings in the Olympics logo. When a pixel is inside multiple objects, its color will correspond to the one with the highest Z value: similar to an argmax operation. Our shader implementation keeps track of the current highest Z value for all the shapes drawn, and compare this value with the Z from any newly added shape to update the color. Because the accumulated highest Z value will be discontinuous at the boundary of each object, comparing it with a new Z introduces compositions of Heaviside step functions. In this example, we demonstrate a simplified problem with two circles whose Z values are parameterized as constant scalars.

Rendering example for circle with parameterized Z-order

Aδ can flexibly use discontinuous variables as input arguments to other discontinuous operations, making the shader authoring straightforward:

Pseudocode: circles with parameterized Z value by Aδ, adapted from https://github.com/yyuting/Adelta/blob/main/apps/render_test_finite_diff_circles_Z.py

Because Teg requires the discontinuities to be parameterized by differentiable and invertible functions, it is clear that the above program cannot be implemented in Teg as it involves compositions of discontinuities. However, one can recursively decompose compositions of discontinuities into multiplications of discontinuities (analogous to rewriting nested if/else branches into a single giant switch statement). If the rewritten program further is such that each continuous program that controls each discontinuity is differentiable and invertible, it will be expressible in Teg with two additional shortcomings. First, the additional rewriting needs manual effort. While it is possible to implement a compiler that automatically applies the rewriting, it is not readily available, and it is not clear whether the compiler-generated rewriting can be as optimized as that generated by human expertise. Second, since this is a combinatorial approach, the number of discontinuities after the rewriting will scale exponentially. Because the time complexity of the gradient program scales with the number of discontinuities introduced, such a rewriting will also exponentially scale the time complexity of the gradient program as well.

Diffvg is unable to express the above example because its framework only allows user-specified constant Z ordering. Therefore, in the example above, the user needs to define either that the blue circle is always on top of the red one, or the opposite. But this layering relationship cannot be parameterized and further explored by an optimizer.

Limitations of Aδ’s DSL

Because the DSL in Aδ’s framework is not Turing complete, it is unable to express arbitrary programs that occur in real world programming. For example, Aδ cannot handle loops with an indefinite number of iterations, and must instead unroll the loops to a maximum number of possible iterations. Additionally, Aδ does not have a mechanism handling scatter and gather operations yet. In some simple cases, the gather operation can be hard-coded as compositions of select functions based on the lookup index, but in general, such operations are not supported yet.

Summary

We introduced the DSL for the Aδ framework. Thanks to our differentiation framework that samples discontinuities based on pairs of samples rather than computing the discontinuity location exactly, Aδ can flexibly express a broad range of programs with discontinuity patterns that represent similar complexity as those authored by programmers.

Unlike Aδ, Teg focuses more on generating provably correct gradient programs, and sacrifices the expressiveness in their DSL. As a result, their framework can either only express discontinuities using simple invertible and differentiable functions, such as affine transformations, or requires additional manual effort to rewrite the program, or find a suitable coordinate transformation that allows their differentiation to be applied.

Other application-specific differentiable frameworks, such as Diffvg, heavily depend on the handwritten gradients for a limited set of discontinuous patterns. These patterns had to be manually derived by the developers, checked for correctness, and integrated into their framework. Therefore, while these frameworks are correct and efficient in their individual domains, their expressiveness stems from the vast number of parameters, not program structure.

The next post will be a tutorial for using the Aδ code base.

Reference

[1] Aδ: Autodiff for Discontinuous Program — Applied to Shaders [project page]

[2] Systematically Differentiating Parametric Discontinuities [project page]

[3] Differentiable Vector Graphics Rasterization for Editing and Learning [project page]

[4] Unbiased Warped-Area Sampling for Differentiable Rendering [project page]

[5] Reparameterizing discontinuous integrands for differentiable rendering [project page]

[6] Path-Space Differentiable Rendering [project page]

--

--