Serverless beyond Functions

I like to play with technology. I think it is the best way to understand its pros, cons, and limits. Most of the time, when talking about serverless, people thinks of functions, such as those provided by AWS Lambda.

Functions can be triggered synchronously, waiting for the response, such in the case of an API call coming through the Amazon API Gateway, or asynchronously, for example if a new file is uploaded to a repository such as Amazon S3.

Here I’d like to go beyond that, considering serverless in its broader definition of building applications “without thinking about servers”.

Over time, lots of triggers have been added to AWS Lambda. Using tools such as CloudWatch Events, you can react to almost any AWS API call by invoking a Lambda function. Leveraging on this, we can easily enrich our application with other interesting “building blocks”, using other services to add functionalities ready to be used.

One of the great advantage of serverless development – and I never miss an opportunity to repeat myself here – is the possibility to “chain” multiple functions together, and design event-driven architectures.

In this way, you can decompose and distribute business logic in smaller components that follow the data flow of your application: if this happens, do that.

Applications built in this way are easier to keep under control, because our human minds are much better in looking for cause-effect relationships than understanding a complex workflow.

Adding new features is also easier, because you don’t need to review all you code base to find the right spots to change, but you can start by thinking:

  • What would be the cause (trigger) of that?
  • Which would be the effects (what to trigger next)?

I learned over time, especially from our customers, that serverless applications can cover multiple use cases, such as mobile back ends, chat bots, or data processing.

A common scenario is web apps, and a quite standard approach there is to have web browsers download static assets (such as HTML, CSS, and JavaScript files) from a web-facing repository such as Amazon S3. To speed up things, and optimise costs, you can distribute this content via a Content Delivery Network (CDN) such as Amazon CloudFront.

The JavaScript running client-side, in the browser, can now call back end APIs that can be implemented using Lambda functions and exposed as web APIs via the Amazon API Gateway.

These Lambda functions should be designed to be stateless, and can use a persistence tier to read/write data. For example, to have a complete managed solution, you can use DynamoDB tables.

There are a lot of exceptions to this “standard” architecture. For example, you can use the Amazon API Gateway to “proxy” a native AWS API call, so that you can map your RESP API straight to a service operation, such as adding data to a Kinesis Stream.

Here, I want to go beyond this approach, and build an application that is more “interactive” than a standard website. To do that, I’ll use other AWS services to provide additional functionalities.

HTTP, at least up to version 1.1, is a request/response protocol, and all communications to send data, or ask for data, start from the client (usually a web browser). If the web browser needs to know if something happened in the back end, outside of its control (for example, if there is new information available), it has to continuously poll the server. There are even specific integration patterns that came out of this, such as HTTP long polling.

Problem is, with plain HTTP, the server is not able to push data to the client. This makes even simple applications, such as a web chat, cumbersome to implement and relatively slow to use. To overcome this limitation, during the long process that brought to the HTML5 specification, WebSockets were introduced.

More recently, HTTP/2 Server Push tried to solve the problem at a lower level in the stack, and this new technology will probably coexists with WebSockets.

In the case of serverless architectures, we can add a WebSocket interface to Lambda functions using AWS IoT, a platform that would normally be used to connect physical devices and have them interact with cloud applications and other devices. It turns out that you can use AWS IoT without any physical device, but just for its features, for example:

  • Supporting long-term connections using multiple protocols, in this case we are interested specifically in WebSockets
  • Publishing, and subscribing, to a hierarchy of topics via MQTT
  • Using rules to process and act upon data published via MQTT
The Message Queue Telemetry Transport (MQTT) protocol is using hierarchical topics to let connected client communicate via publish and subscribe.

The / character in the topic names is used to split the different levels in the hierarchy, for example “a/b/c” is defining a three level hierarchy starting from “a”, then “b”, and finally “c”. When subscribing a device, or a rule, the + is a wildcard that can replace a single level in the hierarchy (for example, “a/+/c”, # is another wildcard that can replace all levels of the hierarchy from that point on (such as in “a/#”).

Topics starting with $ are for internal use, and are not subscribed by subscribing to # (that by definition should otherwise mean “anything”). For example, AWS IoT is using the $aws topic namespace to broadcast information related to the platform, such as device connection lifecycle events, or to keep devices and their “shadow” in the cloud in sync.

Let’s make an highly interactive serverless application using WebSockets. Web browsers will be the devices connecting to AWS IoT, using topics and rules to exchange, and process, data. Let’s build a web chat.

Using WebSockets and AWS IoT, web browsers can receive data from Lambda functions, when those functions publish something on a topic the browsers have subscribed to. And when browsers publish data on a topic, AWS IoT rules can automatically react and do different things, for example:

  • Invoke a Lambda function, using the data published by the browsers as payload (event)
  • Write the data in a Kinesis Stream, that is consumed by a Lambda function processing the data more efficiently, in micro-batches (for example, of 100–1000 records) but with a higher overall latency
  • Store the data in a DynamoDB table
  • Publish back the data in another topic

All those actions can also enrich the data sent from the client using built-in functions. For example, you can get the client ID of the publisher, or the current timestamp. AWS IoT is using IAM roles and polices to allow, or deny, access to its resources.

Let’s have a better look at how we can implement such a flow of data for a web chat.

For the web chat, I used the following topics and rules:

  • Each client can subscribe to the chat/in/${iot:ClientId} topic, where the final part of the topic name is a policy variable that is replaced by the actual MQTT client ID of the connection, and is unique for any client at any point in time.
  • There can’t be two clients with the same ID, so in our web chat any browser has a unique topic they can use to receive information from the back end (in this case, built using Lambda functions).
  • The chat/out topic is used by all browsers to send data to the back end, which can recognise each of them by their client ID embedded in the messages.
  • On their initial connection, browsers use chat/out to advertise themselves to the back end, and a Lambda function is replying with custom code, that is executed in the browsers using the JavaScript eval() function (now you understand why I said at the beginning that I was “playing with technology”: injecting code opens a lot of security concerns that should be carefully evaluated, and I’d like to hear your feedback on that).
  • Since the back end can inject code in the browsers, and add new functionalities, the initial JavaScript code that is provided to the browsers contains only the minimum capabilities required to connect, advertise themselves, and process the first message.
  • After their initial connection, the chat/out topic is used to publish messages in the chat, and, since my implementation is not authenticated, it turns out I don’t need a Lambda function to handle that, but I can use a republish rule to take the message, and publish it back on the chat/pub/${room} topic, where the final part is replaced by the rule with the chat room name extracted from the message payload.
  • Any browser can subscribe to any chat/pub/${room} topic, and receive messages published by other clients very quickly, as all communication and processing happens within the AWS IoT platform.
  • To protect communication, you can replace this republishing mechanism with a Lambda function that sends message back securely on the chat/in/{clientId} topic of each device – but for the purpose of my tests the current approach was enough.
  • Not just the browsers are listening to the chat/pub/${room} topics, another rule is taking all messages there and storing them on a DynamoDB table, so that at the initial connection a browser can retrive the back log of the chat room.
  • If there is a high throughput, and you want to optimise your use of Lambda function, browsers can publish on the chat/stream topic, where a rule is sending all to a Kinesis stream consumed by the same Lambda function listening to chat/out, managing the different syntax of the event payload, and retaining all the internal logic.
  • Finally, a Lambda function is receiving all events from the $aws/events/# topics, where you can monitor the lifecycle of device connections — I am actually just logging this information for debugging purpose.

Let’s review the flow sequence with a diagram (graphics courtesy of this website):

The first connection is to the Amazon API Gateway, that is returning custom HTML pages for any visitor, and then each browser is establishing a bidirectional connection (using WebSockets, via AWS IoT) to receive custom code to execute, and exchange data (messages) with the back end and other browsers, using MQTT to publish and subscribe to topics that can have rules automatically reacting to what is published.

The DynamoDB table storing all messages, for all rooms, is using Auto Scaling to adjust its throughput to the actual workload, and, since my implementation is for demo purposes, Time To Live (TTL) to automatically delete messages older than 24 hours.

I find fascinating that this simple application, using just a few hundreds lines of code, is highly available and scalable, using multiple data centres for all tiers.

This is possible using together “building blocks” such as, in this case, AWS Lambda, Amazon API Gateway, AWS IoT and Amazon DynamoDB, that provide high level functionalities, with built-in scalability and reliability, without the requirement to provision, scale, and manage any servers. This is the power of “serverless” — at least until we find a better term for that.

Reviewing the final architecture, the only scalability bottleneck I found is in the number of messages per second that a single chat room can handle, due to how I designed the data model: I used the chat room as the partition key on the DynamoDB table storing all messages. I don’t expect people to have a high throughput of messages per second in a single chat room, so this seems to be enough for this use case.

You can test the web chat here, write your name and a message:

You can create new chat rooms on the fly changing the path of the URL, for example:

The code of this demo is available on GitHub:

Since this is a demo, I “forcefully” tried to avoid any external file dependency in the Lambda function, so that it could be easily reviewed and edited in the web console. On the other side, you can obvously see that UX design is not my top skill :)

Looking forward to hear your feedback!