Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs

Published in

SNU AIIS Blog

8 min readApr 2, 2022

by Sue Hyun Park

The rapid evolution of deep neural networks (DNNs) has been fueled by the support of deep learning (DL) frameworks like TensorFlow and PyTorch. DL frameworks allow users to build and execute DNNs through Python programming. The standard execution model in DL frameworks is imperative execution: the Python Interpreter executes a DL program just as it treats a regular Python program.

Let us go over a simple DL program to grasp the concept. Here, we assume that the condition the Interpreter first evaluates is True. The Python Interpreter executes the program by interpreting each line of code, as marked from 1 to 4 in the figure. If the Intepreter encounters a DL operation (from tf library), it requests launching a computation kernel of a DL accelerator (typically GPU or TPU), and then the DL accelerator executes the invoked kernel asynchronously.

Thanks to the Python Interpreter’s line-by-line execution, through imperative execution users can confidently utilize any Python syntax. It is the benefit of the convenient programming interface that makes imperative execution the mainstream execution model of DL frameworks.

However, imperative execution cannot obtain the whole structure of main DL computation, commonly represented as a symbolic graph. This is a significant drawback of imperative execution in the sense that it misses optimization opportunities for DL operations; if a symbolic graph is constructed, graph optimizations like XLA or TVM can be applied to speed up DL program execution. The execution model that executes a symbolic graph is called symbolic execution.

How can we combine the usability of imperative execution with the optimized performance of symbolic execution?

In a paper appeared at NeurIPS 2021, we proposed Terra, an imperative-symbolic co-execution system that fully achieves the advantages of each execution model. In this blog, we will first give an overview of previous works that convert an imperative program to a symbolic graph entirely. Addressing the limitations of prior approaches, we will then describe how we harmonize two execution models in runtime.

Prior Works: Imperative Program to Symbolic Graph

Previous works attempt to convert an entire imperative DL program into a symbolic graph and exploit symbolic execution with the converted graph. Here we illustrate the process using the same sample code. The Python Interpreter converts the forward function into a symbolic graph and compiles the generated graph. Then a separate graph executor executes the graph, which ultimately speeds up a DL program compared to normal imperative execution.

Example of converting an imperative program to symbolic graph entirely

Broadly, there are two methods to convert an imperative program into a symbolic graph: single path tracing and static compilation.

1) Single Path Tracing

e.g., torch.trace, JAX, tf.function

Single path tracing approach imperatively executes a single iteration of a program and records the executed DL operations. The recorded linear chain of the executed DL operations is called a trace, which becomes the symbolic graph (see the blue area in the figure below). However, single path tracing could yield an incorrect program behavior. First, it cannot capture dynamic control flows (e.g., if-else, for, while) of an imperative program. In the example below, the symbolic graph does not contain the DL operation required to be executed when the condition is False (i.e., subtraction), because the trace was recorded only with respect to the evaluated result of the initial iteration (i.e., True). Moreover, Python features other than DL operations (e.g., file_write()) are left out in the conversion as well.

2) Static Compilation

e.g., TorchScript, JANUS

Static compilation approach parses an abstract syntax tree (AST) of the imperative program and directly converts each AST node to a corresponding symbolic operation. Even if this guarantees the correctness of the program, the conversion fails if at least one tree node does not have a proper symbolic representation.

1 + 2) AutoGraph, the static-compilation-and-tracing approach

To deal with the limitation of static compilation, TensorFlow AutoGraph emerged as a state-of-the-art system that combines static compilation with single path tracing. AutoGraph generates a new program code by analyzing AST nodes and converting dynamic control flows (e.g., if-else) into proper DL operations (e.g., tf.cond). On the new code, AutoGraph carries out single path tracing, hence the system closely follows the performance of symbolic execution.

Although Autograph enables static compilation and single path tracing approaches to work in a complementary manner, there exist many Python features that neither approach can support: generator, try-except, and None type, to list a few. We attribute the critical reason for failure to the underlying assumption of previous approaches: to totally replace imperative execution with symbolic execution.

Terra: Imperative-Symbolic Co-Execution

If depending on one execution model is imperfect, using both execution models simultaneously could be a promising solution. So, we came up with a new approach which we name as imperative-symbolic co-execution. The key concept is to concurrently execute two components of an imperative DL program:

The Python Interpreter imperatively executes the program without performing DL operations.
The graph executor executes the symbolic graph of the decoupled DL operations.

With the concurrent execution, our approach achieves full coverage of Python language features and high performance of symbolic execution at the same time.

Illustration of our imperative-symbolic co-execution approach

We realize this approach by our proposed system, Terra. The overall co-execution of Terra consists of two phases, the tracing phase and the co-execution phase, as shown in the workflow below.

Starting with a normal imperative execution, GraphGenerator in the tracing phase collects a trace of the initial iteration and generates a symbolic graph of the given DL program.
Terra moves on to the co-execution phase afterwards, where PythonRunner executes all program components except DL computations while GraphRunner executes the generated symbolic graph. For every iteration in this phase, PythonRunner checks if the symbolic graph that GraphRunner is running expresses all main DL operations.
If PythonRunner encounters a DL operation missing in the graph, Terra cancels the current execution of GraphRunner and falls back to the tracing phase. If GraphGenerator records a new trace in the following iteration, it is merged with previously collected traces to re-generate a more comprehensive graph.
Once the trace of the latest iteration is fully covered in the graph, Terra could continue the co-execution for the remaining iterations.

Workflow of Terra. For details on the algorithm for symbolic graph generation, check our paper.

Realizing the Co-Execution

In order to seamlessly maintain the co-execution, Terra allows communicating two types of information between imperative execution and the symbolic execution:

PythonRunner ↔ GraphRunner: data when there exists data dependency (e.g., printing a loss value resulting from a DL operation, receiving a Python primitive value required for a DL operation)
PythonRunner → GraphRunner: execution flow of the program determined by the Python Interpreter (e.g., dynamic control flows)

We insert custom symbolic operations, InputOperation and OutputOperation, into the symbolic graph to realize such communication. Take a look at the left figure. After GraphRunner executes InputOperation, it determines which DL operation to execute next as the Python Interpreter sends the evaluated result to InputOperation. The actual computation (in this case, addition) is launched after PythonRunner confirms that the DL operation the Interpreter is looking at is contained in the graph as well. Similarly, for file writing, PythonRunner receives the tensor data from GraphRunner through OutputOperation (right figure).

Left: `PythonRunner` sends a Python primitive value to GraphRunner and validates the DL operation. Right: GraphRunner sends the computed tensor data to PythonRunner.

Evaluation

We evaluate Terra in the following two aspects:

Can Terra execute imperative DL programs that AutoGraph cannot execute?
How much does Terra speed up imperative DL programs?

We implemented Terra on TensorFlow v2.4.1. Our evaluation baseline is TensorFlow imperative execution and AutoGraph. AutoGraph becomes the benchmark for the optimized performance of symbolic execution, as it shares the same graph executor of TensorFlow with Terra. For the experiments, we use ten imperative DL programs that cover various DL workloads.

Imperative Program Coverage

AutoGraph fails to execute five out of ten imperative DL programs, and the reason for failure can be summarized into four: dynamic characteristic of a DL model, use of third-party library, need of DL tensor value during conversion, and Python attribute mutation.

In contrast, Terra executes all programs without code changes, meeting the usability standards that imperative execution provides.

Training Throughput

For the five programs that AutoGraph can execute, Terra’s performance is comparable to that of AutoGraph. Recalling that AutoGraph closely follows the performance of the symbolic execution, this shows that Terra highly achieves symbolic execution’s optimized performance.

The training speed-up results of Terra and AutoGraph relative to TensorFlow imperative execution. The dotted line presents the training throughput of the imperative execution.

We further check for potential performance improvements of Terra when an optimized graph executor, XLA, is applied. Compared to the imperative execution, Terra improves the performance of seven programs by up to 1.73x when applying XLA.

Conclusion

Imperative-symbolic co-execution is a novel approach that can handle any imperative DL program while achieving the optimized performance of symbolic execution. To realize this, our new system Terra generates a symbolic graph only from the DL operations of a program, and concurrently executes all program components without DL computations with the symbolic graph. Not only does Terra offer high programmability, but it also guarantees correct and fast execution of imperative DL programs.

We believe Terra will accelerate new discoveries from those who use or develop imperative DL programs. In addition, we hope that our research lays the foundation for further improvements in DL frameworks.

Acknowledgment

This blog post is based on the following paper:

Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeong-In Yu, Byung-Gon Chun. “Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs.” 35th Conference on Neural Information Processing Systems (NeurIPS 2021), December 2021. (NeurIPS Proceedings, paper, supplementary materials)

We would like to thank Taebum Kim for providing valuable insights to this blog post.

This post was originally posted on our Notion blog, at January 3, 2022.