Integrating Cobol with JavaScript

Christoph Schobesberger
Oct 23 · 11 min read

This is a guest blog post by Christoph Schobesberger. He shares the results of a GraalVM-related project he did at the JKU Linz University.

GraalVM has support for different languages JavaScript, Ruby, R, Python, and so on. It also includes the LLVM bitcode execution runtime — Sulong — which allows executing programs compiled to LLVM bitcode, usually from native languages like C or C++. By using a compiler like GnuCOBOL that compiles COBOL code to C code, we can run COBOL programs on Sulong. The Interoperability feature of GraalVM allows us to call COBOL code from other programming languages and vice versa. It’s really exciting to have such synergy between a technology created in 1959 and cutting edge technology like GraalVM.

Image for post
Image for post

This blog post showcases how such interoperability can be done and explains some low-level details so that you can try this for your own program. We start with a program in COBOL, rewrite a piece of it in JavaScript, and then execute the original COBOL program making it call the JavaScript module which in turn can call COBOL functions.

SHA-3 Library written in Cobol

For the sample application we use an implementation of SHA-3 written in Cobol available on the GnuCOBOL website. There is one common module KECCAK.cob that performs the actual algorithm and several modules for the different bit widths of SHA3. The module TESTSHA3.cob calls the SHA-3 implementation modules in order to test them. We will rewrite the SHA3–256.cob module in JavaScript and call it from Cobol. The Javascript code in turn has to call the KECCAK.cob Cobol module.

Image for post
Image for post

Prerequisites

  • First, the GraalVM is needed (version 20.2 or higher). For this example we use managed mode of bitcode execution in Sulong which virtualizes it completely and provides better guarantees for running native code. For this you’ll need the Enterprise Edition with the LLVM Toolchain, we used the JDK8 based distribution. You can get it from the GraalVM website.

Setup

The first thing that needs to be done is to set up some environment variables so that we can use all GraalVM tools. The commands can be found in the script setup.sh, which looks like this:

export GRAALVM_PATH=/path/to/graalvm-ee-java8-20.2.0# export path for lli
export PATH=$GRAALVM_PATH/bin:$PATH
export LLVM_TOOLCHAIN=$(lli --llvm.managed --print-toolchain-path)
# set path for gcc
export PATH=$LLVM_TOOLCHAIN:$PATH
# path to graalvm managed libraries
export GRAALVM_LIBRARIES_PATH=$GRAALVM_PATH/jre/languages/llvm/managed/lib
# path to directory containing libraries that are compiled for Sulong managed mode
export MANAGED_LIBRARIES_PATH=$(pwd)/bitcode-managed
# include cobc in search path
export PATH=$MANAGED_LIBRARIES_PATH/bin:$PATH

The environment variable GRAALVM_PATH has to be adjusted to the corresponding path in your environment. After executing source setup.sh the command which lli and which gcc should point to the GraalVM versions.

Additionally, we set the two variables GRAALVM_LIBRARIES_PATH and MANAGED_LIBRARIES_PATH so that we can reference them later on. Finally, since the GnuCOBOL executable cobc will be installed in the bin directory of MANAGED_LIBRARIES_PATH, we add it to PATH.

Now that all environment variables are set, we can compile GnuCOBOL for GraalVM in managed mode. GnuCOBOL requires for mathematical functions either GMP or MPIR and for file management either Berkeley DB (libdb), VBISAM or DISAM. We use GMP and VBISAM.

To build GMP we change to the directory gmp in the provided Github repository and execute the following commands:

mkdir build-gmp-bitcode-managed
cd build-gmp-bitcode-managed
../configure --prefix=${MANAGED_LIBRARIES_PATH} --disable-assembly \
--host=x86_64-unknown-linux
make
make install

For more details on this see the article about the LLVM toolchain.

In order to build VBISAM we change to the directory vbisam and execute:

./configure --prefix=${MANAGED_LIBRARIES_PATH}
make
make install

Now we can change to the directory gnucobol and compile GnuCOBOL:

./configure CFLAGS="-I${MANAGED_LIBRARIES_PATH}/include"
LDFLAGS="-L${MANAGED_LIBRARIES_PATH}/lib"
--disable-nls
--with-vbisam --prefix=${MANAGED_LIBRARIES_PATH}
make
make install

The command which cobc should now point to the just installed executable located in the directory with the managed libraries. You can also execute cobc -i to verify that everything is set correctly.

Rewriting Cobol in Javascript

Now that everything is set up, we can start with the example. We want to rewrite the Cobol module SHA3–256.cob in Javascript. The first thing we need to modify is the call to this module from the module TESTSHA3–256.cob. This module has the same code as TESTSHA3.cob but only calls the SHA3–256 function.

The SHA3–256 function is called like this:

First we need to get a pointer to the javascript function using the polyglot functionality of Sulong. For this we declare in the working-storage section three additional variables and call the function polyglot_eval_file in the procedure division.

Since polyglot_eval_file is a C function and C expects Null-terminated strings, we use Z’…’ which adds the Null termination. Now we can call the javascript function by using this function pointer, like this:

The javascript code looks like this:

Basically, we have to first evaluate the shared library containing the KECCAK function. Sadly, we cannot call the KECCAK function directly because the generated C code does not offer enough type information for Sulong. Instead of adjusting the interface of the KECCAK function and refactoring the entire library, we write a wrapper function and call it. The provided GnuCOBOL source code is modified such that the necessary type information that is needed can be automatically generated for this wrapper.

This wrapper function takes a javascript object with member variables that correspond to the respective parameters of the KECCAK function.

Now we can look at the wrapper function that we need:

This wrapper function takes the arguments as a record and then directly calls the KECCAK function with the members of the record. Since the arguments that we receive from COBOL are passed by reference, we get them as pointers. Accordingly, when we call the KECCAK function we have to pass these pointers directly instead of references to these pointers, hence they are passed by value.

For the reasons why a record is used and an explanation of what is going on behind the scenes see the Technical Details chapter.

Compiling the program

Now we can look at how we can actually build this program. The provided source code includes a makefile that allows us to build the program by executing make TESTSHA3–256.

Let’s look at the necessary makefile lines:

The variable COBCOPTS sets some arguments for cobc. The argument -free tells GnuCOBOL to use the free format for the COBOL source code. The argument -Q is followed by an argument that is passed to the link phase of the C compiler. We use this to hard code the paths to the managed libraries and GraalVM libraries.

The KECCAK.cob module is compiled to a dynamically loadable module. The argument -fstatic-call is needed because the syscall dlopen and dlsym are not yet supported in Sulong managed mode (as of Version 20.2) (see the “How are COBOL CALLs compiled?” section below). The argument -G is followed by the name of a function for which type information for the arguments that are COBOL records is to be generated. Note that this functionality is not part of standard GnuCOBOL and was added as part of this project. To avoid declaring the polyglot function multiple times, we tell GnuCOBOL to not generate C declarations as this would lead to an error by using -fno-gen-c-decl-static-call.

The module TESTSHA3–256 is compiled to an executable by the argument -x. It includes the bitcode as well so it can be executed by GraalVM.

Now we can run it with the following command:

lli --polyglot --llvm.managed TESTSHA3–256

Technical Details

Needed modifications of GnuCOBOL and libraries

The above example used a modified version of GnuCOBOL. Why did we not use an official version? In Sulong managed mode the syscalls are handled by the JVM and not all syscalls are currently implemented. Luckily, GnuCOBOL uses the file configure.ac to check which syscalls are available and configures the program such that it only uses syscalls that are available. Thus, we can remove unsupported syscalls from this file and have a GnuCOBOL version that runs in Sulong managed mode. In this case it sufficed to remove the signal handlers and the syscalls fcntl, readlink and realpath. Furthermore, GnuCOBOL makes use of undefined behavior in C by calling a function with too few arguments at at least one point. Because only the passed arguments are needed this works out but Sulong is stricter in this regard and does not allow this. Since the offending call only does some clean up for the gmp library, this call is currently simply commented out.

Similarly, the library vbisam uses fcntl at one point. This call is also removed.

Furthermore, as mentioned above, some modifications are made to the source code of GnuCOBOL to improve the interoperability with Sulong.

C Code generated by GnuCOBOL

Since we do not directly interact with the COBOL code but with the C Code that is generated by GnuCOBOL, we have to look at the generated interfaces to see how parameters are passed.

The KECCAK function has the following C interface:

int KECCAK (cob_u8_t *, cob_u8_t *, cob_u8_t *, cob_u8_t *, cob_u8_t *, cob_u8_t *, cob_u8_t *);

As you can see, every parameter is passed as a pointer to an unsigned char. Sulong uses the interface to extract type information, but this interface does not offer any useful information.

Ideally we would like an interface where Sulong can extract all the information it needs. One possible way to do this is to use COBOL records because they are very similar to C structs. Records in GnuCOBOL are stored as a continuous array of bytes and accesses to member variables degrade to pointer accesses with an offset to the base address of the memory block. Since Sulong can extract type information from a struct declaration, we just have to generate a C struct that behaves like the COBOL record. Since GnuCOBOL is open source we can modify the generated source code to do this.

For the wrapper function that takes a COBOL record we generate a C struct that looks like this:

We need to add the attribute packed to stop the C compiler from aligning the member variables. The members of the struct have the same names as the items in the COBOL record with the exception that the names have to be made compatible with the C standard, e.g. the character ‘-’ is replaced by ‘_’. Now the only thing left to do is to modify the interface of the wrapper:

int KECCAK__Wrapper__struct (struct KECCAK__Wrapper__struct_LNK_KECCAK *);

These modifications suffice for Sulong to extract all type information.

Alternative way of extracting type information

While generating the C structs automatically is the solution with the least amount of code that needs to be written, it is also possible to call functions without generated C structs. In this case, however, we need to manually tell Sulong the type information. We can write a wrapper function for KECCAK like this:

Here, we need to call the functions polyglot_as_ in order to tell Sulong how to interpret the arguments. Sadly, this solution does not work currently because of one oddity in GnuCOBOL and one bug in Sulong. The odd part here is that we have to set the number of parameters in javascript before we can call the function. The reason for this is that GnuCOBOL stores the number of parameters in a global variable before each call. The global variable has the value 3 from the call to the JavaScript function and the wrapper function takes 7 arguments, which would lead to GnuCOBOL ignoring the last 4 arguments. The corresponding C Code that sets the global variable is automatically generated by GnuCOBOL (see How are COBOL CALLs compiled?).

But since we call a Cobol function from JavaScript, we have to set this variable ourselves. We can do this by adding a function to the GnuCOBOL library libcob that allows us to set the number of parameters, loading the library libcob in JavaScript by calling Polyglot.evalFile with the path to the library and finally calling that function with the argument 7. But a bug in Sulong causes the library to be loaded twice and therefore the global variable cannot be accessed from JavaScript. This issue is fixed in the upcoming 20.3 release.

As a side note, even for the wrapper function that uses only one record argument we should set the number of parameters to 1. We do not have to do this since the variable has the value 3 from the call to the javascript function and the fact that this variable has a value higher than the actual number of parameters does not cause any problems.

Call by value vs call by reference

COBOL parameters can be passed by value or reference, the latter being the default. When we call the COBOL function from javascript we could pass some arguments by value. If we do this, the interface would look like this:

int KECCAK__Wrapper (cob_s32_t, cob_s32_t, cob_u8_t *, cob_u8_t *, cob_u8_t *, cob_u8_t *, cob_s32_t);

This interface is a bit better but is not ideal. First of all, GnuCOBOL gives us warnings when we do this:

KECCAK.cob:78: warning: handling of parameters passed BY VALUE is unfinished; implementation is likely to be changedimplementationKECCAK.cob:79: warning: handling of parameters passed BY VALUE is unfinished; implementation is likely to be changedimplementationKECCAK.cob:82: warning: handling of parameters passed BY VALUE is unfinished; implementation is likely to be changedKECCAK.cob:84: warning: handling of parameters passed BY VALUE is unfinished; implementation is likely to be changed

Secondly, notice that the last parameter in the interface has only 32 bits, whereas the BINARY-DOUBLE requires 64 bits. Hence, we do not use this feature.

How are COBOL CALLs compiled?

COBOL CALLs are by default made dynamically. The following code shows a dynamic call to the function KECCAK:

Before the actual call, an array is filled with references to internal data structures that contain information for each variable. A reference to this array is stored in a global variable that can be accessed by the callee. If we start in another programming language and call the generated C function, GnuCOBOL realizes that this array is not set because GnuCOBOL keeps track of the active COBOL modules. However, in our above example we went from COBOL to javascript and then back to COBOL. Because of this GnuCOBOL expects that this array and the variable cob_call_params are set up properly. Callees may use this array, e.g. to extract the length information of arguments of the type PIC X ANY LENGTH.

The function cob_resolve_cobol searches for the symbol KECCAK in the executable and dynamically loadable modules in the same directory with dlopen and load the functions with dlsym.

A static CALL replaces cob_resolve_cobol and the following function call with the statement:

b_2 = KECCAK (b_8, b_9, b_12, b_13, b_10, b_14, b_11);

Conclusion

In this article we demonstrated how you can run COBOL code on top of GraalVM by using Sulong and the GnuCOBOL compiler. We implemented a piece of functionality of a sample COBOL program in JavaScript and used the interoperability of GraalVM to execute the original program with just one part replaced by the JavaScript implementation, which is called from COBOL and calls other COBOL functions.

If you use GnuCOBOL, we would like you to try this and tell us your experience with it!

Also, this approach can be extended to any language that has a compiler that compiles the language to C code. For other languages using LLVM directly it could be even easier. So if you have any ideas, please reach out, for example by raising a discussion on GitHub or Twitter. If you would like to explore the possibilities of running COBOL applications with GraalVM Enterprise, please contact us too.

graalvm

GraalVM team blog - https://www.graalvm.org

Thanks to Shaun Smith

Christoph Schobesberger

Written by

I am a Computer Science at Johannes Kepler University in Linz. Currently I am in the Masters program with the specialization on Computational Engineering.

graalvm

graalvm

GraalVM team blog - https://www.graalvm.org

Christoph Schobesberger

Written by

I am a Computer Science at Johannes Kepler University in Linz. Currently I am in the Masters program with the specialization on Computational Engineering.

graalvm

graalvm

GraalVM team blog - https://www.graalvm.org

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store