Tutorial: Upload files to Amazon S3 from the server using pre-signed urls

Maks Smagin
6 min readFeb 16, 2023

--

What is the problem?

So imagine we need to create API so web app or mobile app can upload files to the server. It could be profile pictures, documents, videos, basically anything depends on what our app does.

There’s two ways to do it. The simples one it’s synchronous way. By this i mean frontend will send single request directly to API, and upload will be counted as successful only when API will fully read the file from request, save it to folder on the server or s3 bucket for example, and return response.

Here’s an example how you can upload file into local folder or S3:

import Fastify from 'fastify';
import FastifyMultipart from '@fastify/multipart';
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import fs from 'node:fs';
import { promisify } from 'node:util';
import { pipeline } from 'node:stream';
import dotenv from 'dotenv';

// read .env file with configuration
dotenv.config();

// create s3 client using your credentials
const s3 = new S3Client({
region: process.env.AWS_REGION,
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
});

// create fastify app, and attach multipart/form-data parser
const app = Fastify({ logger: true });
app.register(FastifyMultipart);

// Here we receving a file and saving it into local folder
// by streaming it into readable steam
app.post('/upload', async (req) => {
const data = await req.file();

const uploadPath = `${process.env.UPLOAD_PATH}/${data.filename}`;
const writeSteam = fs.createWriteStream(uploadPath);
await pump(data.file, writeSteam);

return { ok: true };
});

// Here we do the same thing, but instead of writing it into the local file
// we going to wait untill all file will be received to memory, and then we uploading it to s3 bucket
app.post('/upload/s3', async (req) => {
const data = await req.file();

const putObjectCommand = new PutObjectCommand({
Bucket: process.env.AWS_BUCKET,
Key: data.filename,
Body: await data.toBuffer(),
ContentType: data.mimetype,
});

await s3.send(putObjectCommand);

return { ok: true };
});

app.listen({ port: process.env.PORT }, () => {
console.log(`Server listening on port ${process.env.PORT}`);
});

Here i am using Fastify instead of Express because it’s generally much better and faster framework, and i recommend everyone who did not tried it yet — do it. I’ll write and article about the differences later.

Synchronous way is the simplest. It’s fast to implement, minimal work required to integrate it on the client side. But there’s downsides: we are limited by quality of internet connection between user and our server. It will work fine in most cases if uploading only small files couple Kb or Mb in size, but it gets a lot worst then we try to upload bigger file, for example video.

Lets count: I guess average upload speed is around 20Mbps, lets try to upload 5Mb image: 20Mbs is about 2.5 MB/s, so 5Mb/2.5Mb/s = 2s. So it will take around 2 seconds to upload. With bigger file, lets say a video with 100Mb in size, it will take a lot longer: 100Mb/2.5Mb/s = 40 seconds! Plus let’s not forget about network errors may occur as well, make file uploading even slower and less reliable.

I think your already understand where it goes, and with bigger file size it will be even worst. So our users will often see 408 Request Timeout exception, which means in this case our upload request took longer than server wants to keep connection open, so it will close it even if our uploading process is going. Or from another side client (for example browser) also can close connection if it takes too long.

For example Chrominum has connection timeout set to 5 minutes, and regardless of server settings it will close connection if request will exceed it. Firefox and Safari has similar connection timeout.

Screenshot from Chromium repository

How to solve it?

Well to make our uploading process much more reliable, and also able to upload big files we can use tecnique partial upload. It basically means we are not going to upload file all in single request, but instead we gonna split file in small parts like 1–2Mb, and upload them each separately, and at the end combine all together into single file. This way we will make it possible to upload huge files, and avoid network issues and request timeout in most cases.

We gonna use AWS S3 to do it, as it provides great implementation for partial upload already. So instead of endpoint, we will have 2 endpoints:

  1. POST /upload/init — to create upload entry in S3
  2. POST /upload/sign-part — to create signed URL for specific part of file
  3. POST /upload/complete — to complete the upload :)

Here’s an simple example:

import Fastify from 'fastify';
import FastifyMultipart from '@fastify/multipart';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
import {
S3Client,
CreateMultipartUploadCommand,
UploadPartCommand,
CompleteMultipartUploadCommand
} from '@aws-sdk/client-s3';
import dotenv from 'dotenv';

// reading .env file with configuration
dotenv.config();

// setting up s3 client
const s3 = new S3Client({
region: process.env.AWS_REGION,
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
});

const app = Fastify({ logger: true });
app.register(FastifyMultipart);

// this first method takes metadata of the file like
// name and mimeType, and creates entry in s3 for parial uploading
app.post('/upload/init', async (req) => {
const { fileName, mimeType } = req.body;

const command = new CreateMultipartUploadCommand({
Bucket: process.env.AWS_BUCKET,
Key: fileName,
ContentType: mimeType,
});

const result = await s3.send(command);

return {
uploadId: result.UploadId,
key: result.Key,
};
});

// in this method we receiving from client part number and upload id
// and generating new signed url to upload the part
app.post('/upload/sign-part', async (req) => {
const { key, uploadId, partNumber } = req.body;

const command = new UploadPartCommand({
Bucket: process.env.AWS_BUCKET,
Key: key,
UploadId: uploadId,
PartNumber: partNumber,
});

const signedUrl = await getSignedUrl(
s3,
command,
{ expiresIn: 60 },
);

return { signedUrl };
});

// and finally here finalizing uploading, by telling to s3 to combine all
// uploaded part into single file
app.post('/upload/complete', async (req) => {
const { key, uploadId, parts } = req.body;

const command = new CompleteMultipartUploadCommand({
Bucket: process.env.AWS_BUCKET,
UploadId: uploadId,
Key: key,
MultipartUpload: {
Parts: parts,
},
});

await s3.send(command);

return { ok: true };
});

app.listen({ port: process.env.PORT }, () => {
console.log(`Server listening on port ${process.env.PORT}`);
});

So here we do really simple things:

  1. Fist we do POST /upload/init to generate unique id of the upload

2. Then for each part of the file (let’s says we going to split it on client in part less than 1Mb) we going to generate unique signed url to upload it to s3. After each part is beign uploaded S3 API returns us ETag header — unique identifier or the uploaded part, we going to need it later to combine all uploaded parts together.

3. Finally we going to send an array of uploaded parts to POST /upload/complete endpoint to finalize the upload. Notice we sending array like:

{
"uploadId": "gKpLhyFiwefwefyvWp_wBLkaCglbvj0dHDyb0sdifubwefwef7TuWKtTESYC366UCaPImUdt.psodjfsdfdfgdfg-",
"key": "image.jpeg",
"parts": [
{ "ETag": "7993ad8b019d3345770d98a96ce1fca4", "PartNumber": 1 }
]
}

We can test it easily in Postman:

Init the uploading
Get signed url of the part number 1
Upload part using signed url (or the whole file because here it’s just 100Kb)
Call complete upload method

As you see there’s a lot more interaction between client and server involved, but this makes our uploading process a lot more scalable and reliable 😉

Conclusion

So hope your guys see on that simple example with file uploading, how important software architecture is, because even in this, from first look super reliable approach with single request — many issues may occur later in production, and the best it’s to think about system your building in long term, trying to imagine bottlenecks in the beginning.

If you liked this article clap and subscribe 👏😉.

--

--