AWS Lambda with Scala and GraalVM
In Scanye we’ve been developing a monolith, containerized Scala web application running on AWS ECS, that has served us well. But as we’ve been growing fast for a while now (and so has been our traffic), we’ve decided to split the monolith and finally embrace AWS Lambda Functions.
As a starting exercise, we’ve decided to go with something simple: a
cron function to regularly clean old and expired data from the database.
Such a function does not need to receive any events or integrate with API Gateway. All we need is to create a DAO object, run a single method and log the result.
There are two reasonable ways of running Scala code on AWS Lambda. One is to produce a JAR file and run it in a usual manner using Amazon’s Java runtime. The other one is to use GraalVM’s ahead of time compilation and build a standalone binary.
AWS Lambda does not support Scala directly, so running Scala code in Java runtime requires dealing with some “Javaisms” at the points of entry and exit. The built-in Lambda serializer doesn’t understand native Scala types,
so we have to map Java code into Scala, which is a pity. Moreover, as utilities of this kind are executed rarely, one at a time and run for just a few seconds, one wouldn’t even begin to feel any efficiency boost achieved from just in time optimizations.
Therefore, building a binary seems to be a better approach, since we can:
- create an executable that is small (and thus lowers startup latency and unnecessary bandwidth usage), letting us reduce a 100MB JAR file to tens or so MB,
- make the code simpler (no Java(ish) code to handle the Lambda interface),
- make it run quickly (as no class loading and rather complex JVM initialization and warm up is necessary),
- make it use as little memory as possible (again, no VM overhead).
Building a lambda
To start with, I created a new package with this simple program:
… and placed it in a new package lambdas
lambdas
└── src
└── main
├── resources
│ ├── log4j.properties
└── scala
└── scanye
└── lambdas
└── printoutCleaner
└── Main.scala
GraalVM executable files can be generated either from a class or a JAR file. I went with the second approach and started with generating a fat JAR with all the dependencies and project’s classes embedded. As we use Mill as a build tool in Scanye, this is how I’ve defined the build (adding to our existing build.sc
):
An in order to run it, I’ve issued:
$ mill lambdas.assembly
The result is a 100 MB JAR file located in out/lambdas/assembly/dest/out.jar
$ ll out/lambdas/assembly/dest
total 102M
drwxrwxr-x 2 mati mati 4,0K lip 2 10:06 .
drwxrwxr-x 3 mati mati 4,0K lip 2 10:06 ..
-rwxrwxr-x 1 mati mati 102M lip 2 10:06 out.jar
Since our code uses a Database
object from a complex internal library bringing with it a lot of dependencies, I’ve ended up with a package far too big for AWS’s limit of 50 MB for a single Lambda function.
I’ve hoped GraalVM’s output would turn out to be much smaller.
GraalVM AOT compilation
The next step was to compile it ahead-of-time. Unlike typical (for Scala and Java) bytecode compilation that later requires a VM to run the program, AOT compilation creates a standalone executable. The produced binary does not require any Java VM to execute, because it actually embeds necessary components from a simplified, dedicated virtual machine called “Substrate VM”. As the GraalVM docs say:
Substrate VM is the name for the runtime components (like the deoptimizer, garbage collector, thread scheduling etc.).
In comparison to a regular Scala program running on a regular Java VM, the resulting application has faster startup time and lower runtime memory overhead, perfectly fitting our purpose.
For compiling JAR files into standalone executables, GraalVM provides a tool called native-image
.
I’ve started simply:
native-image \
--no-server \
--no-fallback \
-jar out/lambdas/assembly/dest/out.jar \
printoutCleaner
Here:
- no-server
— tells the AOT compiler that it can use up to 80% of RAM for compilation,
- no-fallback
— disables fallback to a regular JVM and enforces Substrate VM as the only runtime,
- jar
— specifies a path to the JAR file,
- printoutCleaner
— is the name of the output file.
But the compilation has failed:
Error: com.oracle.graal.pointsto.constraints.UnresolvedElementException: Discovered unresolved
type during parsing: com.codahale.metrics.MetricRegistry. To diagnose the issue you can
use the --allow-incomplete-classpath option. The missing type is then reported at run time
when it is accessed the first time.
Trace:
at parsing com.zaxxer.hikari.pool.HikariPool.setMetricRegistry(HikariPool.java:290)
Call path from entry point to com.zaxxer.hikari.pool.HikariPool.setMetricRegistry(Object):
at com.zaxxer.hikari.pool.HikariPool.setMetricRegistry(HikariPool.java:289)
at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:121)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112)
Some applications have not all dependencies on the class path therefore we need to add allow-incomplete-classpath
flag:
native-image \
--no-server \
--no-fallback \
--allow-incomplete-classpath \
--report-unsupported-elements-at-runtime \
--static \
-jar out/lambdas/assembly/dest/out.jar \
printoutCleaner
I also added two more flags:
- static
flag that makes our exec statically because we don’t know which shared libraries will be available on AWS runtime environment
- report-unsupported-elements-at-runtime
reports usage of unsupported methods and fields at run time when they are accessed the first time, instead of as an error during image building
After that, native image builds properly, and we can try to run it.
Unfortunately, it reports that the org.apache.log4j.Category
class cannot be found:
$ ./result
Failed to instantiate SLF4J LoggerFactory
Reported exception:
java.lang.NoClassDefFoundError
at org.apache.log4j.Category.class$(Category.java:118)
at org.apache.log4j.Category.<clinit>(Category.java:118)
at com.oracle.svm.core.hub.ClassInitializationInfo.invokeClassInitializer(ClassInitializationInfo.java:350)
at com.oracle.svm.core.hub.ClassInitializationInfo.initialize(ClassInitializationInfo.java:270)
...
Native-image does a lot of aggressive AOT optimizations and deletes classes it thinks will not be used.
All classes loaded with reflection cannot be notices by static analysis.
We have to tell the compiler which classes will be loaded by reflection and have to be included in the resultant exec file.
The config has a simple JSON structure:
$ cat reflect-config.json
{
"name":"org.postgresql.Driver"
},
{
"name":"java.lang.Thread",
"methods":[{"name":"getContextClassLoader","parameterTypes":[] }]
}
… and should be placed in PROJECT_DIR/resources/META-INF/native-image
directory.
As the docs say Native Image tool will automatically pick up all configuration options provided anywhere below the resource
location META-INF/native-image
and use it to construct native-image command line arguments.
Unfortunately, in my case it did not work, and I had to specify the path explicitly:
native-image \
--no-server \
--no-fallback \
--allow-incomplete-classpath \
--report-unsupported-elements-at-runtime \
--static \
-H:ConfigurationFileDirectories=lambdas/src/main/resources/META-INF/native-image \
-jar out/lambdas/assembly/dest/out.jar \
printoutCleaner
One naive way ( one I did firstly) is to add required classes one by one to the config.
So we can add missing class to the reflect-config.json
:
...
{
"name":"org.apache.log4j.Category"
}
run the program again, make it fail on the next missing class, add this class to the config, repeat. It is a very annoying and time-consuming way. Even a simple log4j logger has to load dozens of classes dynamically. This approach is really very inefficient because compilation takes a few minutes, and some libraries such as logging packages use reflection a lot, so we can end up adding a hundredth missing class such as LoggingCommandAppenderAwareAspectInstanceFactory :)
Native-image agent tracking
A better approach is to use Graalvm agent that will analyze our program during the runtime and select which classes are required.
We can run JAR file with a special agent which will create a config for us:
DB_NAME=postgres DB_HOST=... java \
-agentlib:native-image-agent=config-output-dir=lambdas/src/main/resources/META-INF/native-image \
-jar out/lambdas/assembly/dest/out.jar
As we can see there are more files in the directory with settings that I probably wouldn’t have set myself:
META-INF
└── native-image
├── jni-config.json
├── proxy-config.json
├── reflect-config.json
└── resource-config.json
To be sure all required classes will be found by the agent we can run the command multiple times making the lambda function behave differently so that we traverse through various code paths.
Native-image agent has a special flag for merging newly found classes into the current config.
We just need to change config-output-dir
into config-merge-dir
.
Due to the fact that reflect config has to be created and updated many times,
we can automate the process a bit writing a simple bash script:
Eventually, we should have the binary build properly.
Other problems
The last issue I encountered was a problem with loading resources
files.
I had many problems with setting up the logger properly.
I found out that all changes to the logging conf file weren’t giving any effects.
To configure logging I used simple log4j.properties
file placed in lambdas/src/main/resources
Seems that the loading file was not included in the exec event I used a special flag -H:IncludeResources=log4j.properties
which should load a specified file from the resources dir.
After some digging I found out that this file is not due to the fact that listing goes through all the JAR files and directories and matches them against a relative path which in our case is: /logging.properties
The fix was to change a flag into:-H:IncludeResources=.\*.properties
[1]
AWS deployment
Having a standalone binary working properly, we can deploy it onto AWS.
To do so we need to build a zip file containing our exec and bootstrap script.
This script is required to tell AWS what should be run in the custom runtime:
$ cat lambdas/bootstrap
#!/bin/sh
set -euo pipefailwhile true
do
HEADERS="$(mktemp)"
# Get an event. The HTTP request will block until one is received
EVENT_DATA=$(curl -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")# Extract request ID by scraping response headers received above
REQUEST_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS" | tr -d '[:space:]' | cut -d: -f2)# Execute the binary
RESPONSE=$(./printoutCleaner -Xmx128m -Djava.library.path=$(pwd))# Send the response
curl -sS -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d "$RESPONSE"
done
The bootstrap
script is executed when an instance of a function is created and should be responsible for handling incoming events and returning results. Lambda function handles many requests during its lifetime (it gets killed after about 10 minutes of inactivity). Therefore we need a loop that will handle these incoming requests. To fetch request data and return results REST AWS runtime API is used.
Then we can pack files and upload to S3:
zip -j printoutCleaner.zip printoutCleaner lambdas/bootstrap
aws s3 cp printoutCleaner.zip "s3://lambdas-deployment-packages/"
… and run this lambda on AWS. In my case it failed due to the following error:
org.postgresql.util.PSQLException: Could not find a java cryptographic algorithm: class configured for SSLContext (provider: SunJSSE) cannot be found..
at org.postgresql.ssl.LibPQFactory.<init>(LibPQFactory.java:182)
at org.postgresql.core.SocketFactoryFactory.getSslSocketFactory(SocketFactoryFactory.java:61)
at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:33)
at org.postgresql.core.v3.ConnectionFactoryImpl.enableSSL(ConnectionFactoryImpl.java:441)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:135)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:211)
Postgres client by default tries to use SSL while connecting to the database.
In my case, it failed because there is some cryptographic algorithm missing. That’s also a result of final exec optimizations. Java security services
are not added by default, and we need to add them explicitly by setting a special flag: --enable-all-security-services
Conclusions
The resulting exec file size dropped from 100MB to 30MB. Of course, it can be done better by for example removing --enable-all-security-services
and adding only needed services explicitly.
We can compare execution time running this on a local machine:
With Scala 2.12 and GraalVM 20.1 results I noticed are:
java out.jar — 1.2s
./printoutCleaner — 0.02s
So even for a simple function difference is significant and for more complex functions it can be more noticeable.
(I tested it on my machine, and don’t know how much can the cold time differ using Java vs Custom runtime on AWS lambda. Might be less.)