A Very Brief Serverless Introduction
There are plenty of blog posts and documentation that give introductions to serverless architectures in general and specific providers and technologies. That said, I’ll start this blog post with a very quick definition from Martin Fowler:
Serverless architectures are application designs that incorporate third-party “Backend as a Service” (BaaS) services, and/or that include custom code run in managed, ephemeral containers on a “Functions as a Service” (FaaS) platform.
One important point this definition highlights is that even though AWS Lambda and comparable offerings from IBM and Google (both called Cloud Functions), Azure (Azure Functions), and self-hosted options (Nuclio, Kubeless, Apache OpenWhisk, OpenFaas) have gotten much of the attention in the developer community, serverless is not just Functions as a Service (FaaS). In fact, the catalog described in this blog post initially uses only Backend as a Service (BaaS) offerings from AWS, predominantly AppSync.
From this point forward, we’ll focus on AWS since this is the cloud we’ll be building our demonstration in. With AWS’s serverless offerings like AppSync and other hosted offering we’ll be using to build our catalog, configuration can be done via the AWS Management Console. However, manually configuring each application environment introduces several problems:
- Every environment must be built and configured manually
- It’s easy to introduce errors during configuration
- Potential differences between environments make testing non-deterministic
- Version control of environments isn’t possible
- Peer review of configuration is much more difficult
To solve this problem, several providers offer frameworks for configuring serverless applications in AWS and other hosts. These frameworks involve yaml files or other descriptors which define the services to deploy and allow you to configure the same application differently in each environment. They also mesh neatly with CI/CD solutions like CircleCI. These offerings include Serverless Framework, Amplify, and AWS Serverless Application Model (SAM). As you might have guessed since we listed it first, we’ll be using Serverless Framework ;).
Again, this has been covered in a lot of depth elsewhere, but I wanted to quickly list the pro’s and con’s that made The Agile Monkeys embrace serverless:
- No need to manage infrastructure — Everything is hosted for you.
- Simple and granular deployment — Using Serverless Framework, a single command deploys everything.
- Cost (potentially)
- Operates and Scales on demand resulting in cost savings — When you don’t need resources, you aren’t charged for them. When demand increases, you can easily scale.
- Replication of production like infrastructure — For testing or demonstration purposes, it’s simple to spin up duplicate environments and spin them back down when finished which saves money vs permanent test environments.
- We put potentially in parentheses because depending on usage patterns, cloud provider, and other factors, serverless may not be the cheapest solution for every problem.
4. Easily integrate Cloud Services — AWS offerings like:
- Cognito (security)
- S3 (file storage)
- CloudFront (CDN)
- DynamoDB (document storage)
- DynamoDB Accelerator (pass through document storage cache)
- Still evolving — With new technologies like this the landscape is constantly changing.
- FaaS cold starts — Care must be taken to ensure that there aren’t delays on startup of function instances while network connections are made or metadata is retrieved.
- More entry points for hackers — With granular architectures like this there are more entities in the system so there are more entities to be protected.
- Managing monitoring, documentation, and system interactions can become complex — Serverless makes it is simple to add a new function or hosted service as needed. Over time this can make a system hard to reason about and operations could become hard to handle if care isn’t taken to manage this complexity.
- Black box infrastructure and services — Obviously, if we’re not hosting/building the various infrastructure components of our application then someone else is. Giving up control and visibility is always a bit of an uncomfortable thing.
Finally, Time for the Product Catalog
Okay, that introduction was a bit long… but hopefully we’ve still got your interest. So let’s get to the true purpose of this blog post, building a product catalog. Before we discuss the technical implementation, let’s go over the requirements we have for our product catalog.
The first thing we need is a way to create, store, retrieve, and update our product data. This means we’ll need a user interface for admins to enter and update data, a back end to interact with this interface, a datasource for storing this data, and a way to make sure we properly authenticate users so that only authorized users can modify our data.
In addition to maintaining product data, we need a way to categorize these products. We need a way to maintain category data and assign products to categories so that our user interface can navigate to particular categories and display relevant products.
In terms of data, the last piece we need to maintain is image data. This probably shouldn’t be stored in the same datasource as our product and category data. The ideal datasource for this is cheap and should allow us to easily apply caching on top of it since image data won’t change frequently.
Now that we have all of this data, we need a way to search, retrieve, and display it. We should be able to retrieve categories, individual products, apply filters, sort, search for keywords, etc.
Finally, we need a way of providing groups of users access to different parts of our catalog to allow admins to edit data and non-admins to view it.
A “Traditional” Microservice Architecture for a Catalog
It wasn’t that long ago that we were talking about decomposing monoliths into microservices (in fact we still are!). But for the purposes of this blog post, I think the most apples to apples comparison we can make is between a microservice architecture for our catalog and our proposed serverless architecture.
Breaking down our domain into services we’d likely have the following:
- User Service — A service to keep track of our users
- Authentication Service — A service to authenticate users and grant them permissions to various functionality
- Product Service — A service to provide CRUD operations for products
- Category Service — A service to provide CRUD operations for categories
- Image Service — A service to allow users to upload and retrieve images
- Search Service — A service to search our catalog
- Integration Service — A service to take API calls from our user interface, fan out the calls to our other services, and compose payloads to return data to our user interface
That’s a lot of services! Now of course, we could probably combine some of these services but then we might run into issues with scaling. If a single service has operations with different SLAs or criticality then that would require us to scale to meet the strictest SLA and we might have unneeded capacity. A less critical operation could also cause service-wide problems and block a more critical operation from being available.
With all of these services, to avoid tight coupling we might need to employ design patterns like CQRS or Event Sourcing to denormalize data. We’d also need caching, something like Varnish or NGINX at the HTTP level, Memcached or Redis at the data level, and a CDN for static content and images. And of course we’d need a database of some sort to store our data. Even if all of this was hosted in AWS, we would still be on the hook for a tremendous amount of DevOps work to spin up VMs, handle monitoring, metrics, and alerting, deal with scaling, etc, etc, etc.
Once all of that was done, our application would look like this:
Our Serverless Architecture for a Catalog
Now let’s look at our serverless architecture for a catalog. This will be composed of a Serverless Framework application made up of an AppSync deployment, S3, a DynamoDB deployment, Cognito, and CloudFront.
Let’s go through each of the pieces and their setup:
AppSync is made up of the following entities:
- Data Source — DynamoDB
2. GraphQL Proxy — The GraphQL engine for processing and mapping requests
- Query — Read data
- Mutation — Write data
- Subscription — Push notification mechanism: Action — Notification to subscribers
3. Resolver — Function that resolves requests into responses using data source
A GraphQL schema defining types and operations, resolver code for those operations, and data source configuration is all that’s needed to get up and running.
File storage that we can setup through our Serverless Framework application to store our images.
Our document store for product and catalog information which is connected to AppSync and provides persistence for our data.
Security as a Service hosted by Amazon that ties in neatly with IAM and allows us to login and permission users in a granular fashion.
Our CDN which we’ll use both as a CDN to serve images and an edge cache for our api.
With these pieces, our application looks like this:
Sooooo, Let’s Compare
Now that we’ve got our microservice and serverless applications designed, let’s lay out a few points of comparison:
- The serverless application is much simpler. This is because AWS gives us so much out of the box and handles a lot of the gotchas that would come with building our own services, storing our own data, and managing caching or denormalization of data.
- The amount of code we’d write is DRASTICALLY smaller for our serverless application. The only code needed is for our GraphQL resolvers. We don’t have to worry about persistence, HTTP requests, serialization/deserialization, and on and on and on. This saves us a tremendous amount of time when building and deploying our catalog.
- Deployment and configuration is much simpler with serverless. With a few 100 lines we can define all of services, their interactions with each other, and their configuration.
- Our serverless app gives us push updates out of the box. Changes to our catalog will automatically be pushed to all GraphQL subscribers. So if a user is browsing our site and some product data changes, this change will immediately be broadcast to that user.
The MVP of this catalog is relatively lean. Given that, there are quite a few areas that we want to focus on and improve:
- Benchmarking — How good is this application? If we can’t get performance comparable to a microservice architecture then we’re doing something wrong (or AWS is).
- Testing — How do we test serverless? With so little code being written, testing that our system works and that changes to it don’t introduce regressions is a challenge. We’ll be looking at creating integration tests that will be able to accomplish this for us.
- More Caching — We can use another AWS hosted service, DynamoDB Accelerator as a pass through data cache, again easily configured with Serverless Framework, to reduce queries to our DynamoDB deployment.
- Monitoring/Metrics/Alerting — CloudWatch gives us a lot of data out of the box. But we need to spend time creating dashboards, gathering metrics, and generating alerts to prove that we have sufficient data to ensure our catalog runs smoothly in production.
- Event Sourcing — Here at The Agile Monkeys, we love event driven architecture as much as serverless architecture. We’d like to split our admin and user UIs and create events to update the data we serve to our user UI when admins change the underlying data.
- Image Resizing — One size doesn’t fit all when it comes to images. Here is a perfect place where we can introduce a Lambda to take image requests and resize images depending on the client requesting them.
- More Advanced Search — The better our search capabilities the better the catalog. We’ll look at technologies like Elasticsearch to build out this functionality.
Keep Watching This Space
This is the first of many blog posts to come. We didn’t get into our CI setup or the React app we’re using as our user interface. And as we increase the features and capabilities of our catalog and tackle the next steps above, we’ll publish new posts detailing our progress.