TL;DR: With the release of AWS Aurora Serverless Postgres (with PostGIS support), building an entirely serverless map stack is now possible, including the database.
The world of serverless computing continues to expand. In my previous post, I discussed using AWS Lambda to run a vector tile server. Recently AWS announced the release of Aurora Serverless Postgres which presented an opportunity to continue the discussion around architecting an entirely serverless map stack.
Serverless Aurora Postgres
On July 9th, 2019 AWS released Aurora Serverless Postgres. Some high level, applicable bullets:
- Postgres 10.7 installed with PostGIS 2.4.
- The database can scale down to 0, as in the database is no longer running. This is not the default but can easily be enabled.
- Data storage scales from 10GB to 64TB in 10GB increments. If you scale the database down to 0 Aurora Compute Units (ACUs), you will only be paying for storage.
- Currently available in the following availability zones: US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo)
To be honest, I’m a bit skeptical of “Serverless Databases”, but I also like to explore new technology, so let’s see how this goes.
The vector tile map stack
When setting up a vector tile stack there are three core components:
- Vector tile server: responsible for listening to incoming tile requests and orchestrating data fetching, tile encoding, tile caching and responding to the request. The tile server may also handle geometry processing (clipping, simplification, makeValid, etc.).
- Data provider: Data providers house the data which will be served to the end-user. Data providers come in many forms, for example, Postgres + PostGIS, GeoJSON, Shapefiles, GeoPackage, etc.
- Tile cache: Generating vector tiles is resource-intensive so if your map data is not highly dynamic then it makes sense to implement a tile cache. Tile caches come in many forms, for example, AWS S3, Redis, filesystem, etc.
The high-level architecture can be visualized as:
Severless vector tile servers
The section title sounds confusing, but the first tool you’re going to need is a vector tile server that has been designed to work with AWS Lambda. Here are a couple of examples of which the first one I have helped develop and the second one I recently found by Henry Thasler.
- Native geoprocessing and encoding
- Supported data providers: PostGIS, GeoPackage
- Written in Go
- PostGIS used for geoprocessing and encoding (ST_AsMVT, ST_Simplify, etc.)
- Supported data providers: PostGIS
- Written in Node.js (typescript)
For additional details around configuring and running these tile servers, visit their project pages.
Aurora Serverless Postgres
Spinning up an Aurora Postgres Serverless instance is remarkably easy. In the AWS console, you simply navigate to RDS and select Amazon Aurora and then choose “Amazon Aurora with Postgres Compatibility”. If you have spun up an RDS instance before, then the majority of this setup will be very straight forward and familiar to you.
The Serverless nuances show up under the section titled “Capacity settings”. Here you will have the opportunity to configure the scaling options for your RDS instance. Scaling RAM is one of the interesting options here, but if you unfold the section “Additional scaling configuration” you will find an option titled “Pause compute capacity after consecutive minutes of inactivity”. Check this box and you can now configure scaling the database down to 0! That’s right, if the database is not being used for a configured amount of time, you can shut the entire database down and cease paying for the resources. As you will later see this comes with consequences, but for many situations, this might be entirely acceptable.
Once the database has been configured, you will receive a hostname to connect to and then you’re ready to connect to the database like you would any other RDS instance. I was pretty impressed with how smooth everything was to setup.
The Cold Start
This post would not officially be about Serverless technology without a mention of “cold starts”. There are plenty of articles covering Serverless cold starts but for the sake of this post a cold start refers to the time cost of instantiating serverless resources.
A cold starts happen when:
- a function has not been invoked for some time (say 10 minutes)
- scaling concurrency
Lambda + tegola_lambda + S3 + Aurora Serverless
The following architectures reference using tegola_lambda for the vector tile server. While other vector tile servers could be used their architectures may differ.
The Glacial Start
The following graphic outlines what I call the “Glacial Start”, an extreme situation when absolutely everything in the serverless stack is cold and the database is set to scale to 0.
Let’s run through the request flow:
- A request comes into API gateway and it’s proxied to Lambda. If the function has not been invoked for awhile the function will have a cold start time of 0–2 seconds.
- When tegola_lambda is instantiated it will parse the config file and open a connection to the database. This process is only necessary during the cold start.
- Lambda functions live outside of VPCs, but typically you want your database to live within a VPC. Lambda creates a network bridge by dynamically instantiating Elastic Network Interfaces (ENIs). This process comes with a cold start time of 8–10 seconds. Note that on September 3rd, AWS announced improved VPC networking for AWS Lambda functions which is addressing this issue. This will take time to roll out across all regions but this is a big step forward for Lambda when leveraging a VPC.
- At this point, tegola_lambda is opening up a connection with Aurora Serverless Postgres but the database has scaled down to 0. In my tests, the initial cold start time for connecting to the database was 30–40 seconds. This is a nontrivial amount of time for a production environment, but for a development environment, this could be completely acceptable.
Some quick math and you can see that a cold start request is taking between 38 and 52 seconds! API gateway times out after 29 seconds so our first request would inevitably fail. Not ideal, but again this less than ideal experience may be acceptable in some situations.
A less glacial start
So while we’re waiting for improved VPC networking for AWS Lambda functions to be deployed to all regions, AWS has provided an alternative suggestion: don’t put your database behind a VPC, but instead use IAM roles to manage access to it. The request flow is very similar to the last architecture:
Let’s run through the changes to the request flow:
- The VPC line is now designated IAM. This indicates that the database is not deployed within a VPC but IAM policies are now the firewall. Notice that the ENI cold start time is gone.
- Aurora Serverless no longer has a cold start time associated with it. Rather than having the database scale to 0 this strategy would run the database at the minimum resource requirements. The pricing for Aurora Serverless is a combination of Aurora Compute Units (ACUs) at $0.06 / per hour and storage at around $0.10 / GB / month. You can roughly estimate that it will cost a minimum of $100 a month to keep the database running.
Using IAM roles as the database firewall may be perfectly acceptable for some situations but I would prefer to keep my database inside a VPC. As you can see, this approach does get rid of the VPC ENI cold start hit, so depending on your requirements this may be an option.
Let’s add a Content Delivery Network
A Content Delivery Network does come with additional costs, but the end-user experience is greatly improved. Since the nature of serving up map tiles is lots of network requests, latency is an important consideration. Here are a couple of options that build on the architecture we have been discussing.
This option is the most straight forward. Essentially you set up Cloudfront (or any CDN for that matter) to point to the API Gateway endpoint that was previously configured and you’re done. The request flow looks like:
Although this is quick and easy to set up, it has some downsides. For example, if you were to set the CDN to cache tiles for 24 hours, as the tiles expire you’re going to need to invoke Lambda calls which could encounter the cold start costs which were previously discussed. Additionally, if the tile has not changed then you’re being billed for running the Lambda function to just fetch the tile from the tile cache. We can do better!
This second architecture is one that I have borrowed from Henry Thasler (author of the other tile server mentioned above) and I think it’s a great approach. Here’s the request flow:
Let’s walk through the request flow:
- A tile request comes into the CDN and if it’s a CDN HIT the request is done. If it’s a CDN MISS, the request is routed to the tile cache (S3) which has been configured for static website hosting.If the tile is already in the tile cache, then it’s returned to the CDN and the request is done.
- If the tile does not exist in the tile cache then S3 issues a 307 redirect to our API Gateway endpoint which will then process the tile request, store a copy in the tile cache (S3) and respond to the request.
I love this design as it allows for a performant user experience that scales horizontally, and as your tile cache fills up the user experience improves. Also when you need to update data, you can purge parts of the tile cache thus causing the stack to regenerate the tiles. And to top it off, using a Serverless stack (including the database) the infrastructure will scale up and down on demand.
Some nuances to consider:
- In order to have S3 return the correct headers (i.e. Cache-Control, CORS, etc.) you will need to setup Static Website Hosting and make sure the tiles have the correct meta-data associated with them.
- Pointing Cloudfront at an S3 bucket will not trigger the 307 redirect correctly. To trigger this behavior point Cloudfront’s origin to the S3 bucket’s website endpoint.
- Cloudfront will cache the 307 redirects unless you setup your cache control headers correctly. This is done by setting the Default TTL on Cloudfront to 0 and then making sure that the proper Cache-Control headers are set on the S3 object.
Would I use this for production?
My experience so far has been great but I have not stress tested this setup enough to be entirely confident in it. What’s great about the discussed architecture is that you can implement parts of the Serverless stack without needing to implement all of it and still run a very robust and performant vector tile stack. In summary here are a few recommendations:
- Use a CDN. A CDN does come with additional costs, but the end-user experience is greatly improved. Since the nature of map tiles is lots of network requests latency does become noticeable.
- Use a tile cache. Unless your data is light and / or highly dynamic a tile cache should be leveraged.
- Pre-seed the tile cache for lower zooms (0–10). Typically the lower zooms provide global context and the data does not change often. Pre generate these zooms rather than wait for the user to request the tiles.
- Don’t scale Aurora Serverless to 0. The cold start time of 30–40 seconds is just too much burden to push to an end-user. Don’t leverage this feature unless the situation can accommodate the hit (i.e. a dev environment)
- If you’re brave, don’t put the database in a VPC and use IAM roles. This avoids the ENI cold start problem, but then again that is going to be less of a problem in the near future.
Well there you have it, the serverless vector tile map stack is entirely possible! Time to deploy the planet!
Frequent Asked Questions
Are there alternatives to API gateway?
On November 29, 2018, AWS announced support for ALBs to invoke Lambda functions. I have not yet fully deployed this architecture using ALBs but technically it’s possible. There is an update slated to land in tegola_lambda v0.11 which will add support for being invoked via ALBs triggers.
What about raster tiles?
I have not tried this with a raster tile server but the same architecture can be used for raster tiles. For example, check out Mapbox’s blog about using AWS Lambda with Rasterio.