Cloud Deep Dive — Serverless Pizza Oven
Organisations need to accelerate innovation and reduce operational costs. Cloud adoption has reached 88% in the UK and efforts like the Cloud Native Computing Foundation are having wide-reaching success. The latest frontier in cloud software development is serverless computing.
Serverless computing lets the cloud provider dynamically manage allocation and operation of machine resources. It involves a finer-grained deployment model for application code, sometimes called Functions-as-a-Service (FaaS), enabled by cloud platform services.
But what makes it so great?
Is it just about breaking up functionality into smaller chunks, focusing on business logic? Is it about not paying for idle resources, making spiky usage more affordable? Or is it about empowering developers with a wide selection of powerful managed services to provision along with their code?
To find out, let’s take a deep dive building out the YLD Cloud Pizza Place — a virtual restaurant in the cloud. We’ll start by creating the functionality for baking our pizzas…
We’ll be building with Amazon Web Services (AWS). Similar functionality is available with providers such as Microsoft Azure and Google Cloud Platform. As with most providers, setting up an account doesn’t cost anything, and we can get a substantial amount of resources free, some only for the first year, but many indefinitely.
We will only be using the tools and frameworks provided by AWS.
Access completed source code at github.com/yldio/serverless-pizza-oven.
AWS SDK and CLI
The AWS SDK provides a simple, well documented interface to the AWS APIs, available as a library for most popular high level languages: Java, .Net, Node.js, Python, Go, and more. Pretty much everything you can do in AWS has an API, which means everything is programmable.
Install the AWS Command Line Interface to access the same functionality from your terminal.
Identity and Access Management (IAM)
We can reduce our security risk by following the principle of least privilege, giving any execution environment access to only what is essential for it to function. AWS IAM policies allow for setting granular API access permissions, controlling who can do what, using which AWS resources, under which conditions.
CloudFormation automates provisioning of AWS resources using text-based configuration files (JSON or YAML). By defining parameters, our CloudFormation configuration files can become templates for deployments in different environments, AWS accounts or regions.
These text-based files can be checked into source control systems along with source code, allowing for the implementation of security auditing and quality controls.
We’ll be using the AWS Serverless Application Model (SAM) extensions to simplify the definition of some resources.
Building the functionality
We’ll start by taking raw items (pizzas) and putting them in the oven.
The Bake Function
Lambda function execution is limited by duration (up to 5 minutes) and reserved memory allocation (3GB). Lambda functions are charged for per invocation and per duration of reserved memory in seconds. Execution of the entire program is stopped after completion of the invoked function or when reaching the configurable duration limit. Idle time is not charged for.
AWS Lambda invokes a configured handler function within a deployed module. The module needs to be packaged up with runtime dependencies and uploaded to an AWS S3 (Simple Storage Service) to be deployed as part of a CloudFormation template. This can be done automatically by the AWS CLI. We can just specify the path relative to own template file. E.g.
$ aws cloudformation package \
--template-file ./template.yml \
--output-template-file template_packaged.yml \
$ aws cloudformation deploy \
--template-file ./template_packaged.yml \
For our function to log all its console output to CloudWatch Logs, we have to set up an execution role that allows only the relevant actions.
An execution role needs to contain a ‘Trust Policy’ that lets the AWS Lambda service create sessions with this role. E.g.
Policy permissions relating to resources normally reference the corresponding Amazon Resource Name (ARNs), possibly including wildcards (‘*’). In this case the logging resources are created upon deployment of our function, so we have to use convention to predict what the ARN is going to be. We can use template parameters, pseudo parameters and the CloudFormation Sub function to generate it. E.g.
We’ll set up further permissions as we go.
The New Items Queue
We’ll use a Simple Queueing Service (SQS) message queue for new items. That means we do not need to know where the items will be coming from and this source also wouldn’t need to know how we are processing the items.
We can configure an ‘event source’ that would let AWS Lambda poll the SQS queue for available items, invoking our function for batches of up to 10 items at a time. Incoming messages to our function will look something like this:
Our Lambda function’s execution role will need to contain policies with permissions that allow for the event source to listen for new messages in our SQS queue and delete messages that resulted in a successful execution. E.g.
- !GetAtt NewItemsQueue.Arn
We don’t want to lose items when our function gets throttled due to concurrency limits (there is a default of 1000 concurrent Lambda executions per account per region), or if some other temporary error occurs. The event source will retry delivery for the lifetime of the message, which is defined in the SQS configuration as a function of age and number of reads (attempts). We’ll set up a Dead-Letter Queue where undeliverable items will be sent.
So far so good
Our CloudFormation template would now reflect all of this:
Different items may need a different amount of time to bake in our oven…
A State Machine
As Lambda functions have a limited execution time (too short to bake a pizza), we need to use something else to manage the duration of the pizza in the oven.
AWS Step Functions provides a finite state machine as a service, configured using an Amazon States Language JSON definition. We can use state machines in situations where we need to co-ordinate relatively complex workflows, possibly over extended periods of time.
Each execution of a state machine takes a message as initial state and proceeds over the other states defined in the Amazon States Language JSON definition. Reaching a particular defined state can trigger different sorts of behaviour: conditionally selecting the next state to advance to, or executing Lambda functions, for example. A Wait State can be set to read the number of seconds to wait before proceeding to the next state. (Note: state transitions are relatively expensive, so you may want to use them judiciously.)
We will need to give our Bake Function’s execution role the
states:StartExecution permission for our State Machine resource. Our states definition should reflect this very simple flow:
The Baked Items Queue
Similar to the queue we’ve defined before: we don’t know yet where exactly our baked items will end up. We can leave them in an SQS queue and allow for them to be picked up from there.
The Item Removed Function
The State Machine cannot directly output to SQS, so we’ll create a Lambda function to be triggered by the
Removed state transition. The state machine will need to have an IAM role with the permission to invoke this function. The function will need an execution role with permission to
sqs:SendMessage to our Baked Items Queue.
Our CloudFormation template would now additionally reflect all of this:
With great scalability comes great concurrency. State Machines can have up to 1 million concurrent State Machine executions, rate limited at 200 new executions per second. For that to be possible for our oven, it would either have to be really big, or the pizzas really tiny. No, let’s give our oven a realistic capacity limit.
In order to ensure we never insert items when the oven is full, we need to keep count of how many items we have in the oven at any time. We can use DynamoDB to persist our count.
DynamoDB is a distributed database, with data automatically replicated across multiple availability zones within a region. This means that reads are by default considered eventually consistent, meaning the data you read immediately following a write could be stale — it may take a short while to reflect recent writes. This would not work in our situation, where we will be receiving multiple items concurrently.
We need our read and write operations to be strongly consistent. The best way to achieve this will be to do both read and write in a single transaction. DynamoDB provides for conditional writes — letting us check whether we have available oven capacity, as well as update expressions — allowing for incrementing a value without knowing the existing value, all in one transaction. As with all database transactions, the transaction comes with the trade-off of taking longer — a bottleneck is exactly what we are trying to achieve.
Here’s a snippet from our DynamoDB update request:
UpdateExpression: 'SET occupied = occupied + :incr',
ConditionExpression: 'occupied <= :limit',
N: String(conf['BAKING_CAPACITY'] - size)
Should the update fail, we’ll know that there’s no space in our oven, otherwise we’re okay to insert the item.
So what do we do with new messages when our oven is fully occupied? We can send them to another SQS queue, of course (assuming our pizzas will keep).
We’d want to get the order right. It’s only fair to bake the items in the order we receive them. That means we should always bake items from this queue before new items. It also means we should ensure that waiting items are baked in the order they were received. A standard SQS queue would not preserve message order, but a FIFO (First-In-First-Out) Queuewould. We can configure the Waiting Queue to be FIFO.
Our Bake function can be updated to pull items from this queue, only deleting them from the queue if we manage to allocate capacity and successfully insert them into the oven.
Closing the Loop
When items are removed from the oven, we have to reflect the available capacity in our counter. We can do this by decrementing the DynamoDB counter from our Item Removed function (without the need for a conditional write this time).
Freed capacity means that waiting items can be pulled from the queue. We know we can already do this by triggering our Bake function. Although it is possible to invoke this function directly using the AWS SDK, we can avoid having to think about concurrency during this operation, and building handling of retry behaviours, by using the New Items queue. We’d have to introduce a special item, not meant to be baked, but only functioning to trigger a new SQS event (which in turn triggers our function). We’ll insert this item into the New Items queue. Our Bake function will simply ignore and discard these items.
Our CloudFormation template, and function code, now also reflect this:
All Together Now
We can write a script to see how items move through our solution. We’ll load 100 items, specifying random baking times, into our New Items queue, and check the number of items at every stage of our flow.
Here is the result:
Check out the completed code at github.com/yldio/serverless-pizza-oven.