Dispatching S3 Files in Node.js

Published in

Trabe

4 min readAug 3, 2020

Some of our intranet backends use S3 storage and GraphQL APIs. It’s a common scenario nowadays. This story is about how we deal with file attachments in our schemas and how our client code can get hold of the real files.

A simple GraphQL server with JWT token based authentication

Let’s start with a very simple server:

We’ll use Koa to build the http server and apollo-server-koa to integrate an Apollo GraphQL server with Koa.
The GraphQL schema allows querying for the files of the logged user.
It uses jsonwebtoken to authenticate users using JWT tokens. For the sake of simplicity it uses a password instead of a digital certificate to sign the tokens. We also assume that those tokens are generated elsewhere. The token payload always contains the user login.
To resolve the files we ask our S3 endpoint for all the entries in a test Bucket that match the following “path” files/<login>/**. We use the listObjectsV2 function from the Amazon aws-sdk package.

If you don’t have access to an S3 account or you prefer to test all this stuff locally, you can use the fabulous minio. You can easily set up a local instance using docker and expose some folder through an S3 API compatible endpoint.

I’m not digging further into any of these concepts. Each of them deserves a story of its own.

Below you can see the tiniest implementation I can think of for this server:

With this in place we can make a GraphQL query in the playground at http://localhost:3000/graphql and get all the files.

Notice that, as I said before, you’ll need to get a token and set the Authorization header in the playground. You can use the following script to create tokens for testing purposes: new-api-token user1.

Our GraphQL ouput is not very useful yet. The url fields don’t contain real endpoints. Let’s fix that.

Getting the real files

We’ll change the resolver return statement to return a collection of S3 signed urls using the getSignedUrlPromise method. A signed url points to the S3 storage server and includes a temporary access token to control how and when a client can access a resource.

Now we can visit the url in the url field returned by the GraphQL and download the file attachment.

This approach has several problems though:

We need to orchestrate and wait for several async calls to getSignedUrlPromise. Thus having a performance penalty (n+1 operations).
We don’t really know if the client is going to really request any of the files, so, preloading the signed urls could be a waste of time.
The signed urls have expiration times (60 seconds in the example). We could make them non expirable but for security reasons we may don’t want to. Preloading all the urls could make them stale before the client requests the files.

To solve this problems we are going to dispatch the files on demand.

Dispatching files

Let’s start by changing our resolver again to make the urls point to a new dispatch REST endpoint.

Now we need to code the endpoint. We could do two things:

Read the data from the s3 bucket and stream it to the client.
Generate a signed url and redirect the client.

Let’s go with the second option. There’s no need to do ourselves what the S3 endpoint can do for us.

We need to add the REST endpoint before we setup the GraphQL server to intercept the requests before they reach the Apollo middleware. The dispatcher route matches any Bucket and Bucket Key. It authenticates the user via the JWT token and then generates the signed url and redirects. The token must be added as a token url param to the dispatch url by the client.

There’s a missing piece though. We are just checking that the token is valid. We are not doing any authorization at all. Using a user2 valid token we can access user1 files 😱.

Improving security

Fear not. We can add authorization policies to our dispatcher.

Given that the dispatcher is pretty generic and matches any Bucket and Key, we can implement a regitry of checkers. Each checker must say if it must be applied to a request (the matcher function) and check the user permissions (the check function).

We’ll traverse this registry to find the appropriate check for each request. As our last check we’ll have a default one that forbids any access.

We add this logic to the dispatcher swapping lines 8–9 with the following:

And that’s it for this story: dispatching s3 files on demand with JWT based authentication/authorization. You can check the full server in all its big-one-file glory in this gist.

Summing up

When dealing with S3 file references in a GraphQL schema it’s better to use an intermediate dispatcher to generate S3 signed url on demand and add authorization policies. This logic also applies to other kind of APIs, not only to GraphQL.

In this story we’ve used a very straightforward implementation with coarse and naive error checking, and tons of hardcoded stuff. To put this in production will require more finesse 😅. I think it will still help you to grasp the concept.

It’s quite easy to do all this stuff in node with npm packages like koa, apollo-server-koa, jsonwebtoken and aws-sdk.

And yes, I think so too. The the aws-sdk fixation on PascalCase parameter names is obnoxious 😑.