Oracle Database Multilingual Engine

Bringing Modern Programming Languages to the Oracle Database with GraalVM

Bastian Hossbach
graalvm

--

tl;dr: GraalVM is a universal virtual machine for running a variety of programming languages with high performance on the same platform. It can easily be embedded into existing systems such as databases. By embedding GraalVM into the Oracle Database, we can provide many modern languages and their ecosystems for transaction processing, data analytics and machine learning. We only have to maintain a single runtime independent of the number of supported languages as opposed to embedding one runtime for each supported language. Basic components such as data type conversions or a server-internal SQL driver can be shared across all languages with GraalVM.

Introduction

In this article we will explore what happens when GraalVM and the Oracle Database come together. Readers familiar with the Oracle Database might know that it can be extended with user-defined functions written in PL/SQL. However, GraalVM embedded into the Oracle Database allows to write user-defined functions also in various mainstream programming languages such as JavaScript and Python. Let’s have a look at a short example session in SQL*Plus:

With the first command we create a new source object named hello.js in the database. It holds the source code of the JavaScript function greet(). The second command is needed to bridge the gap between the dynamically-typed world of JavaScript and the statically-typed world of the Oracle Database. As a result, we can call the JavaScript function greet() like any other user-defined function in the Oracle Database. For example, we can call it from within a SQL query as it is shown in the example above.

GraalVM — A Universal and Embeddable Virtual Machine

GraalVM is a universal virtual machine that runs applications written in a variety of programming languages (JavaScript, Python 3, Ruby, R, JVM-based languages, and LLVM-based languages) with high performance on the same platform. Supporting multiple programming languages allows GraalVM to share basic infrastructure such as JIT compilation, memory management, configuration and tooling among all supported languages. The sharing of configuration and tooling leads to a uniform developer experience in modern projects that make use of multiple programming languages.

Support for many languages is not the only benefit of GraalVM. Although it primarily runs as part of a JVM, GraalVM offers a solution for building native images written in the Java programming language. This native image feature allows to compile an entire Java application, possibly embedding multiple guest languages supported by GraalVM and including the needed virtual machine infrastructure, into a native executable or a shared library. The main advantages of a GraalVM native image are improved startup times and reduced memory footprint. This enables existing applications and platforms not written in Java to embed GraalVM as a shared library and benefit from its universal virtual machine technology.

Database Programming

When it comes to databases, SQL has been proven to be the language of choice for querying data over the last decades. The fact that basically all new query languages (e.g., for stream, graph, or document processing) are designed as dialects of SQL just underlines its success. However, the limitations of SQL are reached as soon as more complex business logic must be implemented. The database community acknowledged this early and extended SQL with procedural features that allow to combine multiple SQL statements and computations into arbitrarily complex data processing workflows. The most popular such extension is PL/SQL, Oracle’s procedural extension to SQL. Those procedural extensions to SQL are specific to the database domain. This explains why the communities around them are quite small compared to the communities of general-purpose programming languages such as JavaScript and Python. The pools of JavaScript and Python developers are significantly larger than the pool of PL/SQL developers. Therefore, it is much easier to find JavaScript or Python developers than it is to find PL/SQL developers. Moreover, popular modern languages come with huge ecosystems of readily available open-source libraries in public software registries (e.g., NPM, PyPI). In those registries, it is easy to find high-quality libraries for basically every task.

The attractiveness of modern general-purpose programming languages eventually pushed database systems to support them. Examples range from Java in the Oracle Database to JavaScript in Microsoft Azure Cosmos DB to Python in Amazon Redshift. Up to now, integrating a new programming language with a database system involved embedding a completely new runtime, with its own memory management, threading mechanism, and so on. This not only is a lot of work. It also significantly increases the complexity of the architecture and code base of database systems. GraalVM with its polyglot capabilities and its support for embedding provides a solution to this problem. Only a single runtime needs to be embedded to enable high-performance implementations of multiple programming languages in a database system.

Multilingual Engine

At Oracle, we are currently working on embedding GraalVM into the Oracle Database and MySQL. We call these extensions Multilingual Engine (MLE). In this article we focus on MLE for Oracle Database only. Oracle Database MLE is an experimental feature at the moment.

Besides embedding GraalVM into the Oracle Database, we also work on tools that make working with MLE as convenient as possible. For example, we are currently developing tools that automatically package an entire application and deploy it into the Oracle Database with a single command.

Another major area of work are the languages we are making available inside the Oracle Database. Languages need to be extended in order to become useful. For example, we need a conversion engine that can convert between database types and language types as well as a bridge between the SQL engine of the Oracle Database and a SQL API of a new language.

MLE offers two different ways for executing code written in by MLE supported languages. First, stored procedures and user-defined functions can be written in a MLE language. Second, a new PL/SQL package called DBMS_MLE for dynamic scripting, i.e., defining anonymous scripts at runtime and executing them, is provided. We first discuss dynamic script execution with DBMS_MLE in the next section. The subsequent sections then explain how to create user-defined extensions that can be used from SQL and PL/SQL.

Ad-hoc Execution of Scripts with Dynamic MLE

DBMS_MLE can execute scripts that are given in form of strings in PL/SQL. Data is exchanged with a script in both directions (in and out) via so-called bind variables. Lastly, a script can print messages that will be put in the output
buffer
of the database. Let’s have a look at a concrete example:

The anonymous PL/SQL block in the example uses the variable script_source to hold a snippet of JavaScript code. This variable is passed to the function DBMS_MLE.CREATE_SCRIPT() to create a new Dynamic MLE script, which can then be executed via the function DBMS_MLE.EXECUTE_SCRIPT(). Before we execute our script, we define and set a bind variable named hello via DBMS_MLE.BIND_VARIABLE(). Bind variables can be defined, set and read in both the surrounding PL/SQL program or the Dynamic MLE script. The script uses the built-in MLE SQL Driver (automatically available as mle.sql) to query the salaries of all employees in the EMP table. For demonstration purposes, we create a simple histogram for the salaries and put it into the output buffer (console.log()). Before transferring control back to PL/SQL, the script manipulates the bind variable hello. The PL/SQL block then extracts the value of the bind variable using the DBMS_MLE.VARIABLE_VALUE() function and prints it to the output buffer. To execute the PL/SQL block, we can send it as a single statement to the database from any client. For example, the entire block can be copied into a SQL*Plus session and executed by entering the slash character.

After the anonymous PL/SQL block above got executed (for example, in SQL*Plus), the database output buffer will have the following content (show with SET SERVEROUTPUT ON in SQL*Plus or use DBMS_OUTPUT.GET_LINE() to retrieve):

Of course, we could just as easily use Python to accomplish the same thing:

Executing this PL/SQL block would place the following lines in the output buffer:

Dynamic script execution in MLE can be used, for example, to bring JavaScript and Python to APEX.

Component Sharing Across Languages

In our first examples we introduced the MLE SQL Driver and showed how to use it from JavaScript and Python. It looks like a module implemented in the used language, but it isn’t. Instead of implementing a complete bridge from a language’s SQL API to the SQL engine of the Oracle Database for each language we add, we have to do the main part of the work only once thanks to the polyglot feature of GraalVM. In a nutshell, the polyglot feature enables a language that runs on GraalVM to access objects and call functions that belong to another language. Therefore, we implemented basic components needed by all languages such as data conversions and the MLE SQL Driver as new internal languages that can be used directly from all other languages. For implementing new languages, GraalVM provides the Truffle framework that we used for that purpose. We just added a thin language-specific layer on top of each MLE language to hide some internals and make them look truly native. The Truffle framework allows not only to implement sharable components, but also to fully leverage GraalVM’s speculative JIT compiler. In the context of databases, the latter is of utmost significance because data conversions are often the main cost factor.

MLE Stored Procedures

While running a script written in a modern language on-the-fly is convenient in many situations, it is not ideal for developing large and complex applications. Dynamic MLE requires a skeleton in PL/SQL and third party libraries cannot be used directly. Furthermore, code is best managed by the database similar to data. To unleash the full power of MLE, we allow to persistently store and maintain user code in the database in form of modules consisting of user-defined functions and stored procedures. For painless packaging and deployment of modules, we plan to provide external tools that do everything with a single command.

Stored procedures allow developers to run code that requires the execution of several SQL statements inside the database server process. This avoids expensive network round-trips between a database client — typically a middleware — and the database. Today, the Oracle Database allows developers to implement stored procedures in PL/SQL or Java. With MLE, developers can also implement stored procedures in JavaScript and Python.

Let’s assume we want to raise the salary of an employee but forbid non-managers from having a salary bigger than $10,000. We can start with a JavaScript function that updates the salary of an employee and returns the new salary:

Note that for improving security and performance we use bind variables in the SQL statement. In this particular case we set the values of the bind variables by giving an array of values. That means the position of a value in the array [raise, empno] determines the bind variable it substitutes (i.e., the first bind variable will be set to the value of raise and the second bind variable will be set to the value of empno). Alternatively, bind variables can be set by name.

Next, we can define a function to check if an employee is a manager:

With those two helper functions, we can now implement our business logic:

The assignment to module.exports is used to export the function salraise() to the database. Putting all together in a file named load_salraise.js, we can add additional code that does the deployment into the database:

We can now run the script that deploys the module code and registers the function salraise() as a stored procedure via Node.js:

The newly created JavaScript stored procedure can be called like any other procedure. For example, from SQL*Plus:

MLE User-defined Functions

MLE also allows developers to implement user-defined functions in JavaScript and Python. The difference is that a UDF can be used like any other SQL function whereas a stored procedure cannot be called from a SQL statement. The JavaScript community created a large set of software packages. MLE allows database developers to easily reuse code from software registries such as NPM.

For example, let’s assume that we want to filter out syntactically invalid email addresses from a large database table. For this, we would like to reuse the JavaScript validator package. Specifically, we would like to use the isEmail() function from it:

We can first download the validator package using the NPM package manager:

Since an MLE JavaScript module is always described as a single file that contains all code, we can use Webpack to create a bundle that exports all functions:

The following JavaScript code, stored in a file named load_validator.js for example, can be used to deploy the module and making the isEmail() function available:

We can then execute the deployment script via Node.js:

After the deployment, we can call the function as follows:

Conclusion

We looked at how the Multilingual Engine (MLE) allows you with the help of GraalVM to use JavaScript and Python in the Oracle Database, bringing their enormous ecosystems to your data-intensive computations. With GraalVM, not only can we rapidly bring new languages to the Oracle Database, but we also have a high-performance speculative JIT compiler at our fingertips. It can be used to generate efficient code for critical parts of queries such as data conversions at runtime.

Download the preview version of Oracle Database MLE based on Oracle Database 12.2 from the Oracle Technology Network. Experiment with the Multilingual Engine and check out the documentation.

Leave us any feedback in our OTN Community, all suggestions, ideas, or issues are welcome.

--

--

Bastian Hossbach
graalvm

Researcher at Oracle Labs. Working on the Multilingual Engine (MLE) for the Oracle Database. Opinions are my own.