Building & Monitoring a Fullstack Serverless App

Either because it is cost-efficient, gets the organization closer to NoOps, or allows for faster iteration, serverless computing is increasingly relevant for specialized use cases such as powering ETL pipelines or acting as glue for DevOps tools.

This post summarizes my findings from building WikiRoam — a fullstack serverless architecture running on AWS and powered by the Serverless Framework. [Repo]

Disclaimer: WikiRoam is a proof-of-concept built as a week-end project to explore the consequences of serverless computing on monitoring approaches. It is not meant to be pretty, optimized, or secure. Use at your own risk. :)

Architecture

The application is structured in 5 layers:

  1. A static Angular frontend served directly from S3.
  2. A Lambda-based API layer exposed through API Gateway.
  3. An ETL pipeline built on Step Functions and triggered by SNS messages for asynchronous processing.
  4. A noSQL database layer powered by DynamoDB.
  5. A turn-key monitoring infrastructure provided by Datadog, through its AWS integration.

The first 4 layers are configured and deployed through the Serverless framework, in YAML files that describe the provider (i.e. AWS only in this case), the functions, and resources.

Additionally, the following serverless plugins are used: serverless-client-S3 (to simplify the deployment of the frontend to S3), serverless-step-functions (see 3. above), and serverless-pseudo-parameters (for convenience).

Let’s dive into each layer and discuss takeaways and lessons learned.

1. Javascript Frontend

The Angular application is hosted directly on S3, in a bucket named www.wikiroam.com (for production), so as to allow the definition of an ‘alias record’ pointed to the S3 bucket in Route 53. Note that the serverless-client-S3 plugin takes care of configuring the bucket for website hosting.

The Angular app simply lets the user search wikipedia (through the MediaWiki action API). It then displays relevant articles as cards, and triggers an ETL workflow that caches articles (reactively) and their backlinks (proactively).

Basic styling courtesy of Angular Material. Reactive search provided by RxJS:

debounceTime() ensures the frontend waits for the user to stop typing during 300ms before calling the API.

distinctUntilChanged() checks that the new keyword is different from the previous one.

switchMap() ensures only the answer from the latest call is awaited for.

2. Lambda-based API

The Serverless framework allows to easily publish an API endpoint powered by lambda functions.

The handler points to the lambda function named ‘get’ exported from the ‘api/wikipages.js’ module.

i.e. ‘module.exports.get = (event, context, callback) => {//do stuff}’

The API Gateway should ideally be tied to a domain such as api.wikiroam.com but AWS requires a certificate to do so. Due to time constraints, the url for the API is currently hard-coded in the frontend instead. Fortunately, Serverless relies on a single CloudFormation stack (per stage, i.e. environment) to create and update the resources and functions, meaning that the url doesn’t change.

3. ETL Pipeline

The caching logic is modeled as a Step Function, where each step is modeled as a lambda function, and the state transitions are guided by the State Machine. While this is obviously overkill for this simple use case, it resulted in cleaner code and logic than invoking lambdas from within each others.

cacheStart() checks whether a specific article is already cached to decide whether to continue or end the workflow.

cacheInfo() inserts the article in a DynamoDB table and cacheBacklinks() inserts the corresponding links into another table.

The workflow itself is triggered by receiving an SNS message. However due to a current limitation of Serverless, the Step Function itself can only be exposed through http, thus requiring a lambda to act as a wrapper (i.e. receive the SNS message to trigger the execution of the Step Function).

4. Managed NoSQL database

The tables and their indexes are configured directly in the ‘serverless.yml’ file, with the TableName being dependent on the stage (i.e. the environment).

The DeletionPolicy setting ensures the data is not scrapped on each update.

As of 2017, DynamoDB can auto-scale. This setting can be configured through a Serverless plugin.

Defining indexes is required to maintain acceptable performance if the API is expected to scan based on attributes different from the primary key.

5. Monitoring-as-a-Service

The integration between Datadog and AWS exposes default AWS metrics, across a wide range of AWS products including API Gateway, Lambda, and DynamoDB. Beyond that, Datadog is able to ingest metrics from specially-crafted CloudWatch log messages.

This simple script is all that’s needed to track custom metrics within the lambda functions.

Datadog has an in-depth guide for how to effectively monitor a serverless stack. It includes suggestions for how to track both work metrics (e.g. throughput, latency) and resource metrics (e.g. throttles). In my case, the resulting dashboard tells a compelling story despite the decentralized nature of the architecture.

Around 10.40pm, the system experienced a spike in database latency above acceptable thresholds due to DynamoDB saturating provisioned write throughput. In turn, the increased response time caused the cacheBacklinks() lambda within the ETL pipeline to timeout on several occasions, likely resulting in gaps in the cache. There is no indication that users were affected.

Pretty neat. The next frontier would be to expand the serverless monitoring approach from metrics (timestamp + counter or gauge) to logs (timestamp + message) and traces (timestamp + context). I’m guessing the Logmatic acquisition is meant to answer the first point. For the second, the APM product could be expanded to either:

  • Provide a library to ‘patch lambdas’ and cascade traces across invocations,
  • Import X-Ray traces through the AWS integration, or
  • Ingest arbitrary opentracing spans through CloudWatch.