Using LLVM LIT Out-Of-Tree

Min-Yih Hsu
7 min readMay 17, 2020

--

Lit is an end-to-end testing infrastructure used in LLVM Project. It’s powerful, flexible, and most important of all, it’s modular: The core of lit is simply a python module that you can fetch from PyPi. Making many people want to use it in their own projects, but pushed back by its non-trivial setups. In this article, I’m first gonna show you when and why should you consider using lit for your tests. Then I’ll demonstrate how to integrate it into your project without a ~20GB LLVM build sitting in your disk, or even putting your test into the LLVM tree.

When and Why should I use lit?

To give you a flavor of how lit works, here is a simple motivated example:

This code was extracted from one of clang’s lexer regression tests. The entire snippet is nothing but normal C/C++ code, surrounded by some weird-written comments. Despite their special notations, those comments are describing how to run this test case and how to validate the output.

The RUN directives in the first and second lines are the commands to process this source file, note that result of the second line is piped into another program, FileCheck, that will check the output against the CHECK directives in the comments of the very same source file. As you might have guess, CHECK directives are a bunch of validation rules written in a regex-alike syntax.

The example above has already shown some advantages using lit over other end-to-end testing infrastructures:

  1. Test cases and runners are integrated together. Not only does this eliminate the need to write extra code to pull test cases (if they’re organized in files) into your test runner, it’s also easier to read individual test.
  2. The command after RUN directive is not tied to any specific (i.e. LLVM) tool: You can write whatever shell commands you want. For example: //RUN: echo "Hello World!" | grep -e "orld"
  3. Lit provides some handy macros. For example, %s will be the current file path, and %t will be a temporary file lit generated for you, such that you can piped result through different RUN directives:
// RUN: echo "/etc/passwd" > %t
// RUN: cat %t | grep -e "root"

A full list of lit macros can be found here.

Of course, all of these design philosophies only work for tests that have a similar input and output formats as tools in LLVM. So I would recommend you to use lit in your project if:

  1. Your test targets are executables that have textual outputs
  2. The executables have command line interface.
  3. (Bonus) You’re using CMake / Autoconf, or build systems that have boilerplate mechanisms (we will cover this later).

In the next two sections I’m going to show you how to use lit in your out-of-tree project. First, in the Short Story section, I’m showing you how to get started with lit in less than 1 minute, but coming with lots of problems. The whole purpose of that section is demonstrating lit as a standalone tool even outside the LLVM tree. The following Long(er) Story section will try to fix those shortcomings and present you a more realistic and (hopefully) more useful use case.

The Short Story

First, let’s grab lit from PyPi repository:

pip install --user lit

Unfortunately the current version of lit does not contain an entry point in its main module that can be invoked with python -m, so we need to call the main function manually with a simple wrapper script, my-lit.py :

#!/usr/bin/env python
from lit.main import main
if __name__ == '__main__':
main()

Here is the motivated example we’re going to use: compile a simple hello world C program and validate its execution output:

According to the RUN directives in line 1 and 2, the test runner will compile this source code into an executable (with a temporary name), run it, and validate its stdout output with grep.

Note that lit will fail if the commands finish with exit code 1. So if the grep command doesn’t find any match, which will result in exit code 1, it would be considered as a failure.

We also need some simple configurations. Lit’s config files are plain python script with some pre-populated variables.

Put the above code snippet into either lit.cfg.py or lit.site.cfg.py, which lit will try to search for and treat as the indicator of a test suite.

After putting the C file and config script in the same folder, run my-lit.py with the following command:

./my-lit.py -v .

And you would get something like:

-- Testing: 1 tests, 1 workers --
PASS: My Example :: naive-lit-example.c (1 of 1)
Testing Time: 0.03s
Expected Passes : 1

That’s it! You just use lit with few lines of code and without any LLVM dependencies! Of course there were a lot of problems in it, and we’re gonna cover those in the next section.

The Long(er) Story

If you’re working on a larger project, using lit as in the previous section will have the following problems:

  1. If you’re testing the tool you wrote. You need the path toward the built binary in order to reference it in the RUN directives.
  2. If your build folder is separated from source folders (e.g. when using CMake), and you would like to invoke testing within build folder via commands like make test , you need the path toward the source folder. Since lit needs the folder that contains both lit.cfg.py and the test case files.

Bottom line for these problems is that we need to setup some environments , which are only available in build script, for lit before running tests.

One of the solutions can be found from the LLVM codebase itself. More specifically, how LLVM runs its regression tests. Diagram below shows the workflow of how LLVM organize its lit config scripts and how to run it.

The lit workflow for LLVM regression tests

One of the most important pieces is the lit.site.cfg.py.in file, which would be used to set up the environment mentioned earlier. As the legend suggested, lit.site.cfg.py.in is just a template, it contains lots of strings pinched by two “@” characters. Part of the file contents are listed below:

config.host_triple = "@LLVM_HOST_TRIPLE@"
config.target_triple = "@TARGET_TRIPLE@"
config.llvm_src_root = path(r"@LLVM_SOURCE_DIR@")
config.llvm_obj_root = path(r"@LLVM_BINARY_DIR@")
config.llvm_tools_dir = path(r"@LLVM_TOOLS_DIR@")
config.llvm_lib_dir = path(r"@LLVM_LIBRARY_DIR@")

These “@-pinched” directives will be replaced by corresponding CMake variables during the CMake generating time. For example, if my LLVM source folder in located at /home/rem/llvm , build dir at /home/rem/llvm-build, then after CMake populate the build dir, there would be a lit.site.cfg.py (not template file anymore!) inside /home/rem/llvm-build/test containing the following lines:

config.llvm_src_root = path(r"/home/rem/llvm")
config.llvm_obj_root = path(r"/home/rem/llvm-build")
config.llvm_tools_dir = path(r"/home/rem/llvm/bin")
config.llvm_lib_dir = path(r"/home/rem/llvm/lib")

To go a little deeper, the real magic is casted by the configure_file cmake command , which does the aforementioned replacements. LLVM’s usage of that command is buried inside one of LLVM’s cmake functions that’s used for adding new lit test suites.

When we’re going to run the regression test using the check target, it’s effectively running the following command:

llvm-lit /home/rem/llvm-build/test

Which will take the lit.site.cfg.py inside that path as the (initial) config file. After populating the environment configurations as mentioned above, it uses the .load_config() python function to load the second stage config file, lit.cfg.py , inside the test folder of source tree. In there, it will do more bootstrapping setup by lit.site.cfg.py , using variables like config.llvm_obj_dir, before running all the tests.

Though this two-phase process sounds complicate and usages in LLVM tree look scary, I’m gonna show you that it’s still possible to adopt this approach with few lines of scripts.

Says you’re writing a simple (dumb) tool, called extra-protein, that will double the number of iterations of each loop in the input C/C++ code. Here is your test case:

You will put this file in the test folder under source root. The “%extra-protein” directive represents the path to your tool executable. We will cover that part later. The lit.site.cfg.py.in , put inside the test folder, looks like this:

And your lit.cfg.py, which sits inside the same folder, looks like this:

As you’ve seen in line 11 and 12, %extra-protein would be replaced by the path to the tool executable. Note that it is recommended to point config.test_exec_root , which represents the path for test outputs, to your build folder. Otherwise the default configuration will put an Output folder next to your test case source and your version control will keep complaining about it.

Finally, on the CMakeLists.txt side, add the following line into test/CMakeLists.txt :

configure_file(lit.site.cfg.py.in lit.site.cfg.py @ONLY)

And create a new check target (also in test/CMakeLists.txt) that will run the tests:

add_custom_target(check
COMMAND my-lit.py "${CMAKE_CURRENT_BINARY_DIR}" -v
DEPENDS extra-protein-tool)

Then you’re all set!

[/home/rem/extra-protein/build]$ ninja check
-- Testing: 1 tests, 1 workers --
PASS: Extra Protein :: extra-protein-test.cc (1 of 1)
Testing Time: 0.087s
Expected Passes : 1

Notes

An important hammer that is heavily used in LLVM’s test suite, but barely mentioned in this article, is FileCheck. FileCheck essentially a (much) more powerful “grep” used in earlier examples that focuses on line-based pattern matching and supports variable bindings. Unfortunately FileCheck is written in C++ and has dependencies on other LLVM components (primarily Support and ADT) so it’s pretty hard to extract into a standalone tool. But you should definitely take a look.

--

--