Mixed Interactive Debugging of Dynamic Languages and Native Code

Published in

graalvm

10 min readDec 13, 2019

A traditional approach to tackle performance issues in dynamic languages like Ruby, R, Python, JavaScript is to rewrite critical parts of a program in C/C++ or Fortran. In contrast to dynamic language code, which is usually interpreted by the the runtime, compiled native code is executed directly by your machine. Note: For the sake of clarity we will refer to the code written in C/C++ or Fortran as native code.

One of the negative consequences of calling native code is that it makes it difficult to debug the resulting program. Interactive debuggers are indispensable tools for software development. There are sophisticated and powerful debuggers available for dynamic languages. Unfortunately, these debuggers usually do not support interactive debugging of both dynamic languages and C code at the same time and in one tool. Which makes it hard to understand what’s happening in the code if it uses any native extensions.

The goal of this article is to demonstrate a unique ability of GraalVM to debug mixed applications written in R, Ruby, or Python together with C/C++ code in a single development environment (read normal R, Ruby, Python applications, because almost all of them use native extensions). We’ll use R and GraalVM’s R implementation — FastR, but similar would work for all languages supported in GraalVM.

R

The R language is a popular tool widely used by statisticians and data scientists for manipulating and displaying data. Among its prominent strengths are the plentitude of extensions available through the CRAN repository and its advanced plotting capabilities.

On the other hand, when speaking of R’s deficiencies, one must put its poor performance in the first place. Programs written purely in R are notoriously slow. While it may not be an issue for analyzing small or medium sized data sets, it definitely becomes a hurdle when processing large ones.

The R language specifies an interface for calling native code from R, which is accompanied by the R API for the manipulation of R objects in native code.

As with all dynamic languages one of the negative consequences of calling native code from R is that it makes it difficult to debug the resulting program. To debug their programs, R developers can use either the built-in debugger or R Studio. Unfortunately, these debuggers do not support interactive debugging of both R and C code at the same time and in one tool.

This limitation can be easily overcome by using FastR, an open source alternative R implementation that, apart from being compatible with GNU-R, puts emphasis on the performance of R execution, embeddability in Java and tooling support. FastR is part of the Oracle GraalVM multilingual virtual machine that, among other things, provides language agnostic support for interactive debugging. Importantly, GraalVM can also be used to execute C/C++ code, which opens up new horizons with regards to interactive debugging R and C/C++ code.

FastR and Native Code

FastR is a GraalVM based implementation of the R language focusing on improving the performance of R code. Being a GraalVM language, it can be embedded in Java applications and has the ability to interface with other GraalVM and JVM languages. Although it is reaching maturity, it is still in the experimental stage. FastR was designed to be efficient, polyglot, compatible, and embeddable, so you can run R in various contexts, such as Java. To get a high-level overview of FastR, read the article Faster R with FastR.

As mentioned above, a good deal of R functions are written in native code for speed. By default, FastR executes native code in the same way as GNU-R, i.e. it natively loads shared package libraries and executes functions exported from them. However, in this configuration, the performance may be far from optimal, especially in the case of frequent calls between FastR (on the JVM) and native code. For better performance, FastR can be configured to run native code of selected R packages using the LLVM interpreter, which is just another GraalVM language. LLVM is a modular compiler infrastructure that is meant to be an alternative GCC, the older compiler infrastructure. A key component of LLVM is LLVM bitcode, a low-level language similar to assembler that is used as an intermediate language when compiling high-level languages.

The native code is produced during the installation of a package by compiling C, C++ or Fortran source files included in the package using a compiler toolchain. In contrast to GNU-R, which uses the toolchain installed on the host, FastR uses the compiler toolchain supplied by GraalVM. In addition to the standard compilation to native code, the GraalVM compiler toolchain also compiles native source code to LLVM bitcode that is bundled with the resulting library artifacts. The fact that the native code is accompanied by the LLVM bitcode in a library gives a user the option to choose whether the functions in a library will be executed using the native code or the LLVM bitcode. Choosing the LLVM bitcode results in a number of interesting consequences, such as performance gains thanks to cross-language optimizations performed by GraalVM or seamless cross-language debugging by means of the GraalVM debugger.

GraalVM Debugger

GraalVM comes with a set of development tools, such as a debugger and a profiler. These tools are tailor-made for GraalVM and offer special gadgets for GraalVM specific features like language interoperability. As the GraalVM Debugger implements the ChromeDev Tools protocol, GraalVM applications can be debugged using Chrome Developer Tools or Visual Studio Code. To learn more, please refer to GraalVM Tools documentation.

Examples of Mixed Interactive Debugging

Let me illustrate the capabilities of the GraalVM debugger on two examples featuring samples of R and C/C++ code being debugged in Visual Studio Code (VSC). The prerequisites for the examples are:

GraalVM installed
FastR installed in GraalVM
Visual Studio Code installed
VSC R Plugin installed
Debugging examples cloned from GitHub
The examples folder added to VSC workspace

The source code and the installation instructions are available in the GraalVM examples repository in the fastr-mixed-debug subdirectory.

Debugging Simple Native Code

The goal of this example is to demonstrate how to debug simple R and C code using FastR, GraalVM debugger and VSC. Additionally, the reader will learn how:

to use VSC and the R plugin to debug the code
to use FastR’s LLVM backend to debug native code
FastR objects are displayed when debugging native code

In the course of this example we’ll be working with two scripts. One written in C and the other in R. The C code is taken from the Writing R Extensions manual and contains a function emulating the functionality of the R lapply function:

#include <R.h>
#include <Rdefines.h>

SEXP lapplyNative(SEXP list, SEXP fn, SEXP rho) {
    int n = length(list);
    SEXP R_fcall, ans;

    R_fcall = PROTECT(lang2(fn, R_NilValue));
    ans = PROTECT(allocVector(VECSXP, n));
    for(int i = 0; i < n; i++) {
        SETCADR(R_fcall, VECTOR_ELT(list, i));
        SET_VECTOR_ELT(ans, i, eval(R_fcall, rho));
    }
    setAttrib(ans, R_NamesSymbol, getAttrib(list, R_NamesSymbol));
    UNPROTECT(2);   
    return ans;
}

The R code is a simple wrapper invoking the lapplyNative C function:

lapplyNative <- function (x, fun, env = new.env()) {
    .Call("lapplyNative", x, fun, env)
}

The C code can be compiled using the following SHLIB R command that produces the lapplyNative.so shared library:

R CMD SHLIB -o lapplyNative.so lapplyNative.c

Now let’s try to invoke the lapplyNative function from the FastR REPL:

dyn.load("lapplyNative.so")
source("lapplyNative.R")
x <- list(a = 1:5, b = rnorm(10))
lapplyNative(x, sum)

The script should give a result like:

$a
[1] 15$b
[1] -1.45445

To start debugging this code, FastR/GraalVM must be instructed to activate the debugger and also to use the LLVM bitcode bundled with the shared library to enable debugging of the C code:

R --inspect --inspect.Suspend=false --R.BackEnd=llvm --R.DebugLLVMLibs

The --inspect argument activates the GraalVM Debugger. By default, the debugger server will be listening on port 9229 for commands from a debugger, which is VSC in our case. The --inspect.Suspend=false argument just makes launching smoother as it prevents the debugger from suspending the execution on the first line of the code. The --R.BackEnd=llvm argument specifies that FastR will use the LLVM backend to load and execute native code and --R.DebugLLVMLibs permits debugging the LLVM bitcode of loaded libraries.

The URL can be copied and pasted to Chrome to start debugging in DevTools.

Debugging in Visual Studio Code

At this point VSC can be launched and used to debug our code. Press the F5 key to attach VSC to the GraalVM debugger. Then switch back to the FastR REPL and execute the R code as already show above. Then return to VSC, locate lapplyNative.R in the VSC Explorer, toggle a breakpoint in the R wrapper function and re-execute the lapplyNative function in FastR. The debugger should suspend the execution and the VSC should grab the focus. The variables panel shows all parameters being passed to the function. You can notice, for instance, that all parameters are promises, some unevaluated.

At the moment of entering lapplyNative the variables panel displays mostly unevaluated promises

Looking at the stack panel you can see the current stack. By clicking on individual items in the stack panel, the debugger UI changes the context accordingly.

Now let’s move on to the C code. Locate lapplyNative.c in the VSC Explorer, toggle a breakpoint in the lapplyNative function and press F5 to resume the execution. The debugger should stop at the breakpoint. Look at the variables panel, specifically at the ans variable, which is the result of the function. It is an illustration of how R objects are seen from the perspective of the LLVM language. You can see the <foreign> tag next to a couple of variables indicating that these objects are foreign to the LLVM language. The term foreign stands for objects originating in another guest language. What it means for the LLVM interpreter is that it must communicate with such objects via a special protocol called “interop” that allows, for instance, retrieving members of objects as we see them in this panel.

Looking at the stack panel, you can notice that the stack consists of heterogenous frames originating in different guest languages:

Inside the lapplyNative C function. The function arguments are R objects that are seen as foreign from the perspective of the native code.

The hybrid call stack consisting of R and C (resp. LLVM) stack frames

Debugging Complex Native Code

In the second example I will be dealing with a more complex debugging scenario. In particular, this example is covering:

Debugging a package containing Rcpp code
Stepping into and debugging Rcpp functions

The prerequisites for this example are:

Rcpp 1.0.0 installed from the unpacked source tarball
R CMD INSTALL package-sources/Rcpp
The gibbs sampler example installed
R CMD INSTALL ./gibbs

Note: to debug package native code, the package must be installed from an unpacked source tarball.

Note: the Gibbs sampler code is taken from http://adv-r.had.co.nz/Rcpp.html#rcpp-package by Hadley Wickham

Debugging Rcpp Code

Let’s launch FastR in the debug mode, load the gibbs package and execute the gibbs sampler function:

Then switch to VSC and attach it to the GraalVM debugger. Locate the gibbs/src/gibbs.cpp file and toggle a breakpoint in the gibbs_cpp function. Go back to FastR and run the gibbs_cpp function again. The VSC debugger should grab focus and you should see something like this:

The hybrid call stack when debugging gibbs_cpp

Looking at the data field in the variables panel, you can spot a certain limitation of the current FastR debugging support. Normally, it should display a foreign object and its members, but here, we see just a plain memory address. The reason is that under certain circumstances, foreign objects must be “nativized”, i.e. converted from their Java representation to the native one so that they can be stored in the native memory.

And now try to step into the rgamma function by pressing F11.

The debugger should land in the Rcpp rgamma function and you should see the values of the arguments in the variables pane.

As a final note, keep in mind that this code is currently being interpreted by the GraalVM LLVM interpreter, which means it is actually the underlying LLVM bitcode of the Rcpp library that is being debugged. And as the bitcode contains debugging information referring to the source code, it can give an impression of debugging the original C++ code.

Conclusion

This article has demonstrated how one can debug applications written in R, Ruby, or Python together with C/C++ code in a single development environment. In the presented R example, FastR serves as a debugger backend built on top of the GraalVM debugger, and VS Code provides a comfortable debugging UI. The mixed debugging of languages like R, Ruby, Python and native code is made possible by GraalVM’s capacity to load and execute the LLVM bitcode attached to shared libraries when being built using the GraalVM LLVM Toolchain. The LLVM integration with GraalVM language implementations not only brings the mixed debugging capability, but it also improves the performance on the boundary between managed and native code, as the native code is interpreted just like another GraalVM language and thus the GraalVM compiler is able to perform optimizations across that boundary.

To try it, download GraalVM from graalvm.org/downloads. If you have feedback or feature requests, please create an issue in the Github repository or talk to us on Twitter: @graalvm.

References

Website: http://www.graalvm.org/
Github Repository: https://github.com/oracle/graal; https://github.com/graalvm/examples
Stay Tuned: https://twitter.com/graalvm; graalvm-announce@oss.oracle.com
FastR overview: https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb
GraalVM compatibility (can be used to check the status of a package): http://www.graalvm.org/docs/reference-manual/compatibility/