Cloud object storage services (AWS S3, GCS, more recently Azure Blobs) were among the first cloud products to make us stop having to think about the servers that the cloud runs on.
Critical enablers for the serverless paradigm, they are often taken for granted and, it can be argued that they are one of the most under-appreciated components in the serverless landscape.
Let’s explore 6 powerful ways we use these services to decompose and decouple our applications; building more secure, more scalable applications faster and more cost-effectively.
6. Cross-cloud configurability
Object storage is one of the most mature services across most public clouds — AWS S3 was released in 2006 and Google Cloud Storage in 2010. Whatever the specific reasons for your data to be distributed across clouds — I generally advocate for simplicity and don’t typically recommend it — you can be fairly certain of parity in terms of basic features across these platforms.
To get an idea of the differences across clouds, I created a single, working, multi-cloud (incl. Azure) Terraform configuration:
With this configuration we get a privately accessible store in one region of each cloud provider, allowing us to implement our own processes for make copies of our data across providers. (Although it demonstrates these basic features across clouds, it doesn’t go much further than that, so YMMV. In the rest of the article we will be talking about features available in AWS.)
Keys to the kingdom
5. Granular permissions
Object stores are in essence key-value stores, which makes for relatively straightforward options in terms of the actions that can be performed and the scope of objects permissions apply to.
Each individual object has a unique reference that can be used in IAM permissions as a resource ARN, e.g.
arn:aws:s3:::bucket_name/key_name. This lets us leverage the platform’s powerful security features, such as role-based access control and the ability to generate temporary credentials.
We can also use wildcards (*) when scoping using resources in IAM permissions, which means we can give access to groups of objects, such as by prefix
arn:aws:s3:::bucket_name/group_name/*.jpg , or suffix
For access across many different key formats, even across buckets, we can use object tags. Tags are key-value pairs that can be used to categorize objects. Objects can have many such pairs, and adding/removing specific tags can also be controlled by IAM permissions. Object tags can then be referenced as conditions in object permissions to limit the scope of access to only objects that have particular tags, e.g. to match the tag
4. User interfaces (incl. CLI)
Don’t underestimate the value of a point-and-click user interface in the workflows needed to provide the (rare, but usually critical) human operational support for your applications. With user interfaces being subject to the usual security configurations on your data, incl. encryption, audit logging and granular permissions, you can still rely on the platform to protect your data to well-defined standards.
Command-line interfaces are also very useful in working with data for development and support purposes. Meaningful key prefixes are useful here too for getting to relevant data:
$ aws s3 ls s3://bucket_name/uploads/2019030
2019-03-04 12:31:16 6587151 uploads/20190304.txt
2019-03-01 18:36:29 3394016 uploads/20190301.txt
2019-03-07 22:19:56 6595295 uploads/20190307.txt
Write operations in object stores can generate notifications that enable complex serverless workflows. Depending on where you send the events, e.g. SNS, SQS or directly to AWS Lambda, you can achieve many different event distribution topologies.
Events are only fired upon successful operations and messages only contain the metadata about the operation. This means that you can configure only the relevant parts of the subsequent workflow to have access to object data, and only if needed, allowing us to stick to the principle of least privilege.
Event notifications can be configured to have different destinations based on the key prefix or suffix. A simple example where we use this is where files with a specific extension trigger different workflows, e.g. when a video is uploaded, a sequence of thumbnails are generated using video processing workflow, whereas uploaded images are simply converted to a standard format.
The object store is not just for unstructured data: we can use it for data with well-defined schemas such as with CSV or JSON. (In fact, S3 Select and Athena makes it possible to even run queries on the contents of objects, directly from S3, without provisioning any servers.)
Creating meaningful keys for such objects lets us expose just enough information about the contents of an object to take relevant action, e.g. using a schema identifier in the object key when you want to initiate different versions of processing based on the version of a schema.
2. Object (im)mutability
I generally prefer to use S3 as an immutable store, that is, the contents of individual objects should never change. That way you can use it as a historical record of the data that flowed through the system, and more importantly, you avoid having to think about the states any of the many different client processes could be leaving objects in.
If you have long-running workflows and you want to be sure that the object you are reading is still in the state it was when triggering a event or since you last read it, you can either check the unique hash (ETag) of the contents of the object, or you can ensure versioning is turned on per bucket. Versioning lets you refer to exact historical versions of an object, even after it has been deleted.
Won’t that take up a lot of space? You certainly do not need to worry about running out. Individual objects can be up to 5TB in size, and in 2013 AWS already had an unfathomable 2 trillion objects being accessed at 1.1 million requests per second. The total number of objects you can store is unlimited.
But what about the cost? Even if the short-term storage cost is not going to be a problem to most everyone, you may still want to avoid having objects around over the long-term. For that, Object Lifecycle Management can be configured to either move your data to lower cost archival storage, or to simply implement a policy for disposing of stale data on a schedule.
1. Web hosting: the original and best
Probably one of the oldest and most common uses for S3 is hosting static web content. With the wide adoption of SPAs, server-rendering (SSR) has become a secondary implementation in lots of web applications. (You can see my post about an SSR implementation involving S3 here.)
Some of the key web hosting features in S3:
- Set the content-type header on individual objects to let browsers know how to display content
- Specify CORS headers to let browsers apply cross-origin security policies
- Set redirection of requests to different URLs
- Specify index and error document object locations to present pages at the roots of sites and directories, and to present nicer alternatives for missing pages (404s.)
You can also combine S3 with other AWS services:
- AWS Route 53 for a DNS service that can link directly to your S3 bucket
- AWS Certificate Manager for free SSL certificates that enable encryption on your own domains
- AWS CloudFront CDN for an international performance boost at a very reasonable price
And remember the fact that you can use temporary credentials to allow for uploads directly to S3 from a browser? Well, you can also upload to S3 with a simple HTML form POST — an old but prevalent way of doing uploads, just without the need to ‘own a server’– you’ll generate a single-use pre-signed URL for the occasion.
Whether you’re adopting containers or FaaS, wherever you are exploiting horizontal, elastic scalability: if storage/security/distribution/scalability is a problem, remember to think again about how Cloud object storage may be the answer.