How We Built a Serverless Backend Using GraalVM, AWS Lambda and Astra DB (Part 1)
When the pandemic started, we set ourselves a learning goal: develop a backend using only serverless technologies. Initially, we set out to make this happen using technologies we were already familiar with, AWS Lambda and Java. But to spice things up, we decided to add some new technologies to the mix — GraalVM to eliminate the JVM Lambda cold start problem, and DataStax Astra DB as our serverless DBaaS.
So, we spent a couple of hours per week on building a serverless order processing API using Astra DB and AWS. Due to the pandemic, you could call it a distributed “hackathon” of sorts, in which we had three main challenges:
- Access Astra DB from within AWS Lambda
- Write automatic tests for our Astra DB client
- Set up the Lambda function to use the GraalVM native image runtime
In this first post, we will walk through the first two challenges and the technologies that helped us on the way, mainly Stargate and Testcontainers. In the second post, we are going to dive into how we put our serverless API in the cloud using AWS API Gateway, AWS Lambda and GraalVM. First, let’s take a look at the high level architecture.
To give you a better understanding of what we were going for, here’s a (rather simple) overview of the target architecture.
Our end user accesses the API through an AWS API Gateway which is wired to our AWS Lambda function. The Lambda function in turn accesses the Astra DB Document API which is internally provided by Stargate.
Amazon API Gateway is a fully managed API service to create, publish, maintain, monitor, and secure APIs at any scale. Those APIs can be connected to a large number of different backend services.
AWS Lambda offers managed functions as a service (FaaS) based on micro virtual machines. To create a Lambda function you provide the code to execute, e.g. a Python script or a Jar file. The function can be invoked on demand based on a variety of triggers.
Astra DB is a multi-cloud database-as-a-service (DBaaS) based on Apache Cassandra™ that eliminates the overhead of installing, operating, and scaling your own database installation. Essentially, Astra DB helps developers reduce deployment time, costs, and nightmares. Astra DB also equips you with a few data APIs to build applications faster, which leads us to our next big player — Stargate.
Stargate is an open source data gateway and the official data API for Astra DB. In short, it allows developers to connect to all their data with the APIs and tools they are used to. You can create tables and schemas and query data without learning Cassandra Query Language (CQL).
Now let’s take a closer look at the two goals we set ourselves for part one of this series.
Access Astra DB from Java
First of all, we had to figure out how to access Astra DB from within AWS Lambda with minimal dependencies. Lambda functions should be able to start as quickly as possible and we wanted to avoid bloating our JAR file with unnecessary dependencies.
Additionally, Lambda functions should be stateless, given that the runtime can be paused/frozen without notice for a longer period of time — or even destroyed completely. Although compared to other runtimes, such as Python, the Java runtime appears to stay up even between executions. But this behavior should not be counted on. To keep things simple, we accessed the Document API via an Apache HTTP client.
Another problem with AWS Lambda is you cannot easily perform database migrations. You have limited control over when your function is executed and how many instances are created. Also, if you migrate the schema on start, whenever someone uses your API for the first time they have to wait for your schema migration to finish first. This is why using the Document API, which doesn’t require specifying a schema upfront, was our best bet for accessing Astra DB from AWS Lambda.
Test Astra DB client locally
Having the Java code to access Astra DB is great, but then how do we test it without spinning up an entire Cassandra cluster along with Stargate? Luckily, Stargate offers a developer mode, where the Stargate node behaves as a regular Cassandra node, joining the ring with tokens assigned to get started quickly without needing additional nodes or an existing cluster.
We can start a local Stargate node for our automated tests using Testcontainers. For the unfamiliar, Testcontainers is a Java library that provides lightweight, throwaway instances of common databases or anything that can run in a Docker container. This essentially makes it easier to run tests for things like data access layer integration, app integration, UI/acceptance, and more.
Getting into the code
The main functionality of our fictional API is to manage orders for an online shop. We need to save and retrieve orders. The class
AstraClient encapsulates this functionality in the methods
getOrder, respectively. Those methods interact with the document API.
To access our orders collection, we need to pass the Astra DB base URL, the access credentials, as well as the namespace (also known as “keyspace” in the Cassandra realm).
Next, we implement a simple test case that saves and then retrieves an order. For this, we create a new test class
AstraClientTest annotated with
@Testcontainers for the Testcontainers framework to manage the
@Container lifecycle. We also implement a small test extension that manages namespace and token creation and provides our test class with an
Now, let’s dive into the stargate container definition. We start it in developer mode to act as a DB node. We also use SimpleSnitch, since we do not need a particularly sophisticated snitch functionality.
By default, Stargate starts a CQL service on port 9042, a REST auth service for generating tokens on 8081, and an HTTP interface on port 8082. Since we used the Document API, we do not need to expose the CQL port.
Next, we implement a test method that persists and subsequently retrieves an order in
shouldPersistAndRetrieveOrder. Our test extension generates a client that points to our Stargate container and has working credentials. We then use that to call
getOrder in succession, validating that the retrieved order matches the originally stored one.
Before we dig into the details of the test extension, let’s cover the missing
AstraClient functionality. To save and retrieve orders, we need a data class containing order data (let’s call it
Order). To persist an order, we submit an HTTP POST request to the orders collection endpoint with the order JSON as payload. The response object contains the newly created document ID which we can use as our order ID.
To retrieve an order, we submit an HTTP GET request to the document ID resource inside the orders collection. Our order will be wrapped inside a JSON object that contains the actual order in the data field. We model this wrapper in the
OrderDocument class. The
getOrder method returns an
Optional<Order> which is empty in case the order doesn’t exist.
At this point we can run our test and validate that the implemented functionality meets expectations. Now let’s take a look at the test extension. The following listing presents an outline of the class — it implements the
BeforeEachCallback interface which tells JUnit to execute the
beforeEach method before each test execution.
beforeEach we first generate an auth token. To do this, we call the Stargate auth endpoint and post the username and password via HTTP which then returns the auth token.
After passing the auth token to our
AstraClient, we ensure the namespace (aka keyspace in Cassandra) exists. The production code assumes that the keyspace exists since we create it as part of our infrastructure provisioning code using the DataStax Astra Terraform provider. In the test case we simply create the namespace via HTTP.
With that we conclude the code for this part of the “hackathon.” So far we’ve successfully covered our first two goals: we implemented an
AstraClient that uses the Astra DB Document API to store and retrieve orders. Then we tested our code using a custom JUnit 5 test extension along with the Testcontainers framework.
In the second part of this series we will show you how we implemented an AWS Lambda handler that accepts HTTP requests from AWS API Gateway, transforms them into Astra DB requests using our AstraClient class, and returns a response to the user. The handler is written to run in a GraalVM native runtime which minimizes those pesky cold start issues we always bumped into with the default Java runtime.
Stay tuned for the next post to continue our tour of the technologies, challenges, and workarounds involved in getting our serverless API into production!
In the meantime, you can poke around the source code for this project in GitHub. If you have any questions or want to know more about this project, head over to the DataStax Community and we’ll meet you there. To reach one of us in particular you can find us on Twitter @FRosnerd and @raffael.
Follow DataStax on Medium to get notified of new posts on all things data, cloud-native, and open source. To join a buzzing community of developers from around the world, follow DataStaxDevs on Twitter and LinkedIn.