Amazon S3 is a very cost-effective, flexible and reliable object storage system/platform to store and retrieve any amount of data for applications. Objects are stored on S3 in the system called “Buckets” with key-value pairs, with additional configuration options. To access a particular object from a bucket, you just need to make a GET request on the URL of the resource.
Objects stored on S3 are accessible to the owner of the bucket. The owner can then implement their own retrieval mechanisms to cater to various client needs. While sharing the public URL of the S3 bucket would be the most straightforward and easy implementation, it comes with security risks such as brute force attacks, misuse of publicly shared resources etc. The security aspect becomes even more important when the objects stored are not just icons or website resources, but also highly private things such as scanned verification cards, photo id etc.
So how do we mitigate these risks, while continuing to serve objects to the client?
There are two ways you could go about it :
- Build a server application, which serves endpoints such as http://gets3objects.com/media/images/90290901, along with proper auth codes. The application then streams the required resource to the client. This approach can quickly lead to a very slow client application, since every, request now has to go through your client to server to s3 and back. This might not end up being a good user experience.
- The second approach uses S3 pre-signed URLs to serve the client with S3 URLs that are short-lived while keeping the buckets private. Once the URL is generated, it can be reused as many numbers of times as we want until the duration of the signed URL expires.
For our application, we did not want to bombard the server with too many GET requests for the same resource. So we used S3 pre-signed URLs to implement duration limiting URLs. The core strategy revolved around storing the s3 relative path like /media/images/900092/, against the user/entity, in the database.
To access the resource related to an entity, we created an entity: resource map which essentially is a mapping to know what kind of document needs to be accessed by a particular request. For eg, to request a user’s photo, the client would send the name of the entity “user” and the field in which the relative path of the resource is stored “photo”. The server application would then fetch the relative path from the database, sign the URLs and send back the url in response, for use by the client.
Generating S3 Signed URLs
AWS provides the facility to use signed URLs for access to data stored in private buckets. The following explanation is taken directly from AWS documentation and describes the creation and use of signed URLs:
The Amazon S3 REST API uses a custom HTTP scheme based on a keyed-HMAC (Hash Message Authentication Code) for authentication. To authenticate a request, you first concatenate selected elements of the request to form a string. You then use your AWS secret access key to calculate the HMAC of that string. Informally, we call this process “signing the request,” and we call the output of the HMAC algorithm the signature because it simulates the security properties of a real signature. Finally, you add this signature as a parameter of the request by using the syntax described in this section.
When the system receives an authenticated request, it fetches the AWS secret access key that you claim to have and uses it, in the same way, to compute a signature for the message is received. It then compares the signature is calculated against the signature presented by the requester. If the two signatures match, the system concludes that the requester must have access to the AWS secret access key and therefore acts with the authority of the principal to whom the key was issued. If the two signatures do not match, the request is dropped and the system responds with an error message.
In order to implement a pre-signed url in our app, we need to create an application endpoint which would:
- Authenticate the request based on proper auth codes
- Check the user permissions on the requested resource
- If both 1 and 2 are satisfied, generate a signed s3 url and return to the client.
As you can see it in the image below which shows our “Signing Algorithm”.
For our server application, we used boto, the Python SDK for Amazon Services. The basic implementation for this SDK is pretty straightforward :
First, we create a class called “AwsSignedUrls”, which implements the method “sign” for the url signing. This method takes parameters such as relative path, expiry, headers, and https. For our implementation, we provide this method with an s3 relative path of the request which is generated and stored against the user/entity in the database. This param “expiry” is used to set the expiry duration for the particular resource url.
Next, we import boto’s S3Connection to create a connection to the service. At this point, the variable “c” points to an S3 connection object. We pass our “aws-access key” and “aws-secret-key” to this connection object. Next, we use the “generate_url” method on this connection to generate an URL for the requested “key”, which is the relative_path param. This returns the signed S3 url, which then we return to the requested client.
The API layer then uses class “AwsSignedUrls”, with proper relative paths to return the generated url. (As shown in the image below)
This basic implementation provides a lot of flexibility to implement different expiration durations across various clients and products while providing a very solid abstraction on accessing the resources on S3 buckets.