Compiling Geant4 to WebAssembly

Saurav Sachidanand
4 min readMay 7, 2018

--

Continuing from my previous post, we’ll now compile Geant4, another C++ event generator library, to WebAssembly. This process will be slightly less straightforward compared that of PYTHIA.

You can download Geant4's source code from here. I used version 10.04.

Compiling Geant4

Geant4 uses a CMake build system, and so we first run the following from its source code directory

$ mkdir build
$ cd build
$ emcmake cmake \
-DGEANT4_USE_SYSTEM_EXPAT=FALSE \
-DBUILD_SHARED_LIBS=OFF \
-DBUILD_STATIC_LIBS=ON \
../

Here,

  • We choose to build the Expat XML parser provided in Geant4’s codebase itself instead linking with the system installed one. This is because we need to generate LLVM IR bitcode for dependencies as well before we can generate the final WebAssembly target.
  • We change the default target of the build from dynamic shared libraries to static libraries. Although Emscripten has limited support for dynamic linking, we’ll not use that feature here since it’s much simpler to work with static libraries.

After that, we run

$ emmake make

This will take several minutes, maybe 20–30 mins depending on your computer. It’s best to use the -jN option to parallelize the job.

The target library files will now be stored in build/BuildProducts/lib.

Running example B1

We’ll compile the example program in examples/basic/B1. Here again we need to run CMake, but before that need to do some more work.

This program will need an input file that describes the physics simulation to run, passed as an argument to main. We’ll use the exampleB1.in input file for this. Two other files, run1.mac or run2.mac in the same directory, can also be used.

The program will also need to access few dataset files that store information about different particles and phenomena; it will expect certain environment variables to be set to the locations of these datasets. So first, from this page, download and unpack the following datasets: G4EMLOW, G4ENSDFSTATE, G4NEUTRONXS, G4SAIDDATA, G4PhotonEvaporation. Then store these dataset folders at a single location; we’ll refer to it as <data-dir> .

Now, we need a way to

  • Pass the name of the input file as an argument to main
  • Set the environment variables that point to the datasets we downloaded.

For this, in the B1 directory, create a new file setup_env.js with the following contents

Module.arguments = ['exampleB1.in'];Module.preRun.push(function() {
ENV.G4LEDATA = '/data/G4EMLOW7.3';
ENV.G4LEVELGAMMADATA = '/data/PhotonEvaporation5.2';
ENV.G4NEUTRONXSDATA = '/data/G4NEUTRONXS1.4';
ENV.G4ENSDFSTATEDATA = '/data/G4ENSDFSTATE2.2';
ENV.G4SAIDXSDATA = '/data/G4SAIDDATA1.1';
});

Module is a global JavaScript object within an Emscripten compiled program, on which we can set certain useful properties.

  • One of them is arguments for main. Strings from this array will start from argv[1] , so we don’t have to specify the name of the program file.
  • Another is preRun. This is an array of functions that will be run before main is called, and here we can set the environment variables that point to the datasets. Make sure to change the suffix numbers if they are different from what you downloaded. Note that these paths are w.r.t the virtual filesystem within the Emscripten program, not on the host system.

We’ll make this code run before the main program is run.

Now, we move on to modify CMakeList.txt in the B1 directory. Open it up and find the following pair of lines,

add_executable(exampleB1 exampleB1.cc ${sources} ${headers})
target_link_libraries(exampleB1 ${Geant4_LIBRARIES})

and add the following two lines after

add_executable(exampleB1 exampleB1.cc ${sources} ${headers})
target_link_libraries(exampleB1 ${Geant4_LIBRARIES})
set_target_properties(exampleB1 PROPERTIES
LINK_FLAGS "-s TOTAL_MEMORY=270MB -s WASM=1 \
--pre-js ../setup_env.js \
--preload-file exampleB1.in@/exampleB1.in \
--preload-file <data-dir>@/data")
SET(CMAKE_EXECUTABLE_SUFFIX ".html")

We set the suffix of the final target to .html as otherwise Emscripten will produce a .js file, which is meant to be run on NodeJS. We also specify necessary flags which will be passed to the final em++ command that produces our final targets.

Now we can run CMake. From B1 directory, execute

$ mkdir build
$ cd build
$ emcmake cmake -DGeant4_DIR=<geant4-dir>/build ../

This example’s CMake file also needs access to other CMake files from the first build directory we created, hence we set a variable to point to it.

At last, we execute make and run the program

$ emmake make
$ emrun exampleB1.html

Hosting on Github

Like the previous post, I tried hosting this example on Github Pages. However, exampleB1.data (the preloaded files) turned out to be around 250MB, and Github has a single file size limit of 100MB. I also tried using Git LFS, but that gave strange download errors in the end. Even if I had hosted it, 250MB for a single page wouldn’t be very practical.

So, to reduce the file size, I compiled the same example to native code, and ran it using strace. Then I extracted all the paths containing <data-dir> so I could preload just those

$ strace ./exampleB1 2> syscalls.txt
$ grep <data-dir> syscalls.txt

Using this, I narrowed down to the immediate subdirectories of the accessed files

G4ENSDFSTATE2.2
PhotonEvaporation5.2
G4EMLOW7.3/brem_SB
G4EMLOW7.3/livermore/phot_epics2014
G4EMLOW7.3/livermore/rayl
G4SAIDDATA1.1
G4NEUTRONXS1.4

Preloading just those paths reduced exampleB1.data to around 50MB. As you can guess, G4EMLOW7.3 was the biggest culprit. You can probably reduce the size further by individually preloading every dataset file that is accessed by the program.

The demo is now hosted here.

--

--