Ravi — a Lua dialect
Ravi is a dialect of the Lua programming language. It started off as a fork of Lua 5.3, and then borrowed some features from Lua 5.4. It adds the following features to Lua:
- Limited static typing to improve performance
- The ‘defer’ statement
- A Just-In-Time (JIT) native machine code generator
- Ability to generate native machine code Ahead-of-Time (AOT)
- A set of batteries i.e. useful libraries
- A debug adapter for Visual Studio Code
As a programming language, almost all Lua 5.3 programs are Ravi programs. In interpreter mode, Ravi can run almost all Lua 5.3 programs, the only exceptions being Lua functions that create more than ten dozen local variables. This limitation stems from the fact that Ravi uses an extended set of bytecodes, and in order to support the larger set, has one bit less available space to record the identity of virtual machine registers. In practice, this doesn’t seem to pose a huge problem.
Ravi borrows features such as the generational garbage collector from Lua 5.4, and some other internal improvements. However, as a programming language it is largely a superset of Lua 5.3. Lua 5.4 introduced some new syntax and features — such as <const> and <close> annotated variables; these are not supported by Ravi.
Language Enhancements in Ravi
The main language enhancement in Ravi is the support for limited static typing. Limited because the goal of the static typing is to help improve performance, rather than to support programming at scale. Moreover, Ravi builds on the existing types within Lua, and does not try to add a fundamentally different type system on top of Lua.
Ravi originated in an attempt to provide a faster scripting language for financial applications. This legacy explains the focus on improving performance of primitive integer and floating point values, as well as arrays of these primitive types.
A unique feature of Ravi’s type enhancement is that it is implemented in the single-pass Lua compiler, that compiles Lua code to bytecode without creating an intermediate Abstract Syntax Tree (AST). The design of the Lua parser and compiler poses a challenge when adding support for static types; Ravi successfully adds static type information without giving up the extremely efficient nature of Lua parser/compiler.
Having said that, more recently, an additional compiler framework has been added to Ravi. This new framework does utilize traditional compilation techniques such as creating ASTs. In future this may allow more sophisticated type system to be added as an option.
Type Annotations in Ravi
Local variables and function parameters can be optionally annotated with types in Ravi. Here is a simple example that sums elements of an array:
function sum(arr: number[])
local n: number = 0.0
for i = 1,#arr do
n = n + arr[i]
end
return n
endprint(sum(table.numarray(10, 2.0)))
The syntax is quite conventional: the variable name is followed by a colon : character and then by the type name.
Function return values cannot be annotated at present.
The following type annotations are available.
- integer — an integer value
- number — a floating point value
- integer[] — array of integers
- number[] — array of floating point values
- table — a Lua table
- string — a string value
- closure — a function value
- Name[.Name]* — a user defined type that has a meta-table registered in the Lua registry. This allows userdata types to be asserted by their registered names.
A full description of how these types can be used and their semantics is out of scope for this overview. The interested reader can check out the Ravi documentation on type annotations.
The presence of type annotations helps the compiler generate more efficient type specific bytecodes. These bytecodes translate to more efficient native code when JIT or AOT compiled.
Later on in this article, I present a couple of examples of the performance improvements obtained by using these type annotations.
The ‘defer’ statement
Ravi adds the ‘defer’ statement to the language. First popularized by the Go Programming language, similar constructs are now available in many other programming languages.
The form of the defer statement in Ravi is as follows:
defer
block
endWhere block is a set of Lua statements.
The defer statement creates an anonymous closure that will be invoked when the enclosing scope is exited, whether normally or because of an error.
Example:
y = 0
function x()
defer y = y + 1 end
defer y = y + 1 end
end
x()
assert(y == 2)defer statements are meant to be used for releasing resources in a deterministic manner.
JIT Compilation
The raison d’être for Ravi was to create an extension language for a financial services application. The idea was to provide users a scripting language for implementing derivative pricing algorithms while still achieving good performance. For example, when interest rate curves are constructed, it requires various instruments to be repeatedly priced as the solver tries to converge to a curve that will bring the overall net value of all instruments to zero. This requires hundreds or thousands of calls to the pricing routines. Allowing a scripting language means users can be empowered to create any instrument type they wish, instead of being limited to predefined types for which pricing has been already implemented. This use case required good performance for numeric arrays and arithmetic operations involving numbers.
I initially tried to use LuaJIT for this use case. But there were two primary issues with LuaJIT.
- LuaJIT doesn’t do very well when there is frequent context switching between the backend code and LuaJIT code. It works best when you can write all your code in LuaJIT as the tracing JIT can inline function calls.
- The second issue was that I needed a solution that I could support myself; LuaJIT is an extremely complicated implementation that requires multiple person years of effort to get a good understanding of its code base. This was not an option open to me.
As a result, as an experiment, Ravi was born. Initially the only types added to the Lua language were integer , number , integer[] and number[] . The first JIT native code generator was implemented using LLVM.
Multiple JIT Backends
I tried several different JIT compiler implementations between 2015 and 2020.
- The first JIT backend used LLVM to generate native code. I used the C++ LLVM API, initially starting with version 3.5 of LLVM. Over the years I updated the implementation to support new versions of LLVM. The final implementation used ORC v2 APIs in LLVM 10.0. The LLVM backend was retired in 2020.
- I also implemented a JIT backend using libgccjit, which is part of gcc. This implementation did not offer any performance advantages over LLVM, and the compilation speed was slower because libgccjit under the cover uses the gcc backend that relies on intermediate files. This implementation was retired in 2017 as it was too much work to maintain it in parallel with the LLVM backend.
- Next I tried using NanoJIT — a JIT engine that was created by Adobe, and used for a short while by Mozilla. This backend was too difficult to use on its own as it did not support aggregate types such as C structures. So I had to adapt a C front-end to allow the JIT compiler to use C as the intermediate language. Unfortunately, NanoJIT turned out to be not a good fit as its optimization capabilities are minimal.
- In 2017 Eclipse OMR became available, with a JIT compiler that is used in the IBM Java runtime. It turns out that the OMR JIT compiler is quite general and can support a large subset of the C language; enough to allow Ravi to use it. Eclipse OMR generated much better code than NanoJIT, however it did not achieve the performance results I was getting with LLVM or libgccjit. Its principal advantage was that it is a much smaller implementation compared to LLVM or libgccjit. I created a cut-down version of Eclipse OMR that only contains the compiler code, thereby cutting down its size further. This backend was retired in 2020.
- The final and current implementation of the JIT backend uses the MIR JIT engine being developed by Vladimir Makarov, who is a RedHat maintainer of gcc. This JIT backend generates native code on par with Eclipse OMR while being a fraction of the size of Eclipse OMR. Moreover, this is the first JIT implementation that is practical for a small language such as Lua — the engine is so small that it can be compiled in less than 10 seconds, yet it achieves great performance.
Performance Comparisons between JIT Backends
The following tests were done in 2019. All tests used type-annotations in Ravi. The tests were done on RHEL 7.7 running on Intel Xeon X86–64 processor. Smaller values are better.
Ahead Of Time Compilation / New Compiler
One of the challenges with converting Lua/Ravi bytecodes to native code is that the bytecodes operate against the Lua VM stack which is a heap structure. The runtime stack is dynamic and grows and shrinks during program execution since the Lua VM tries to keep the stack size to the minimum required size.
As bytecodes are executed, any operation that can result in a function call can result in the stack being reallocated. This means that from the JIT compiler’s point of view, the stack is a heap structure that has to be reloaded often. When values such as numbers are manipulated on this stack, it takes a very clever optimizer to see the opportunities where a value can be replaced by a hardware register or the placed on the C stack.
LLVM’s optimizer is able to perform this degree of optimization provided we give it some help in the form of type-aliasing data.
An alternative approach is to use a different intermediate representation that is designed to allow use of C stack variables where possible. In particular, in Lua/Ravi, this technique can be used for primitive values that do not escape as up-values, as well as intermediate primitive values used in arithmetic calculations.
The new compiler implementation is designed to allow this. For instance here is the intermediate representation (IR) for the sum function shown earlier.
define Proc%2
L0 (entry)
TOFARRAY {local(arr, 0)}
MOVf {0E0 Kflt(0)} {Tflt(0)}
MOV {1 Kint(0)} {Tint(1)}
LENi {local(arr, 0)} {Tint(2)}
MOV {Tint(2)} {Tint(3)}
MOV {1 Kint(0)} {Tint(4)}
SUBii {Tint(1), Tint(4)} {Tint(1)}
BR {L2}
L1 (exit)
L2
ADDii {Tint(1), Tint(4)} {Tint(1)}
BR {L3}
L3
LIii {Tint(3), Tint(1)} {Tbool(5)}
CBR {Tbool(5)} {L5, L4}
L4
MOV {Tint(1)} {Tint(0)}
FAGETik {local(arr, 0), Tint(0)} {Tflt(1)}
ADDff {Tflt(0), Tflt(1)} {Tflt(2)}
MOVf {Tflt(2)} {Tflt(0)}
BR {L2}
L5
RET {Tflt(0)} {L1}All the operands in bold will be created as temporaries on the C stack. Numeric constants get inlined.
While the NanoJIT experiment failed, one useful outcome from that implementation was the use of C intermediate code generation. The AOT compiler is an extension of this idea, and works by generating all of the Lua data structures needed for the VM to think that the compiled native code is just another Lua/Ravi function.
Here are some performance numbers from a more complex benchmark. This benchmark uses an implementation of the Dantzig’s Simplex method for linear programming and runs it on a large data set from netlib. Type annotated version of the Lua code was used for Ravi. Smaller values are better.
The Ravi bytecode JIT is referred to as Ravi BC JIT, and the JIT using the new intermediate representation is referred to as Ravi New JIT.
The native code generated by MIR JIT when using intermediate C code is about 2.6x slower compared to AOT compiled gcc -O2 output. The performance of the native code generated via the bytecode JIT degrades to about 6.5x of the AOT output.
Limitations of JIT/AOT compiled code
There are some limitations imposed by the native code generation.
- The compiled code can only be used in the main Lua/Ravi thread; coroutines always execute in interpreted mode.
- Additionally the new compiler framework does not support features that rely on Lua/Ravi bytecodes such as the debug api.
Future Directions
Clearly the new compiler framework generates more efficient code than the older JIT implementation that translated Ravi/Lua bytecodes. However the benefits from the JIT and AOT compilation are only available for arithmetic operations and numeric arrays; and then only if the source code is annotated with types.
There are many areas that need improvement for the AOT/JIT code to perform well in the general case.
- Support for inlining small Ravi functions within the same source code file would significantly help because the function call overhead in Ravi is quite significant.
- Local variables can be converted to constants where the compiler can detect that the value does not change.
- The biggest challenge is the performance of objects when represented as tables or Userdata types. Unlike JavaScript, Lua does not have a well defined class/object model; it is largely a Do-It-Yourself affair with some syntactic sugar from the language. This makes it harder to optimize based on a usage pattern. Userdata objects have to be implemented in C, and have the benefit that data layout can be controlled. However accessing fields within a Userdata object requires dynamic dispatch via hash table lookup and translating fields to objects that can be represented in Ravi. For instance, a string embedded in the Userdata object needs to be converted to a Lua string at runtime, which has a significant performance overhead.
- Type inference can significantly help reduce the need for type annotations. Unfortunately, in Ravi, function return types cannot currently be annotated. Moreover as functions are values, it is not known at compile time what the true return values will be; although if support for annotating return values was added, then some optimistic assumptions can be made when the called function’s definition is visible, and the compiler can ascertain that the call-site will not change at runtime.
- Strings are often used as keys in table access; Ravi tries to optimize string based key lookup when the target is known to be a table. More work is needed to generalize this to more situations; at least the first one or two hash lookups could be inlined. The downside of inlining though is an increase in the code size which may adversely impact performance.
- All evidence shows that the only way to generate optimized code is via type specialization. Even in the absence of type annotations and type inference, it should be possible to detect used types at runtime and specialize an entire function based on data obtained this way.
- Ravi uses an embedded JIT backend that supports C as input language. If a safe subset of C can be exposed in the language this will enable some nice features such as efficiently manipulating userdata objects.
Batteries
A well known issue with the Lua language is that its standard library is just a wrapper over the C Standard library. This is because Lua authors want Lua to be able to run on any platform that has a C compiler available.
The de-facto standard repository for Lua libraries is the LuaRocks repository. However the issue with this repository is that it is a collection of libraries with no particular purpose. Also, none of the libraries are optimized for or guaranteed to work with Ravi.
To make the use of Ravi more pleasant for users, I created the Suravi project. The goal of this project is to create a small set of curated libraries for Ravi, that are also tested and potentially optimized to work with Ravi. The project also aims to create a standard set of documentation.
A Debugger for Ravi in Visual Studio Code
In my experience lack of a debugger makes it much harder to write correct programs. Therefore creating a debugger for Ravi was one of the first priorities. A Visual Studio Code debug adapter for Ravi was created in 2016, and contains basic functionality for debugging Ravi/Lua scripts.
What’s next for Ravi?
I never got to use Ravi for the original use case it was created for, as I had to cancel the project. Nevertheless working on Ravi has been rewarding and I plan to continue making enhancements. Given that the work is done in my free time, progress is inevitably very slow.
A key question is whether Ravi can maintain compatibility with Lua. The answer appears to be no; already there are features introduced by Lua 5.4 that are not supported in Ravi. The <const> and the <close> annotations are not supported in Ravi. Moreover, I think that the ‘defer’ statement is preferable to <close> and I provide patches for implementing the ‘defer’ statement in Lua 5.3 and Lua 5.4.
In the future it is likely that Ravi and Lua will diverge further. I would like to add a default library for creating classes, as well as an ability to embed a safe subset of C in Ravi. Moreover the new compiler framework no longer uses Lua bytecode design and it is likely all future enhancements will focus on utilizing this framework. Not having to maintain compatibility with newer Lua features means there is more freedom to evolve Ravi as a language to become an independent language in its own right, while still retaining the ability to run most Lua 5.3 compatible code.
