Upload files to Amazon s3 after validating its content using Lambda & API Gateway

Published in

Credit Saison (India)

3 min readJul 29, 2020

In this article, I’ll show how we managed to build functionality to upload multipart files to an Amazon S3 bucket. This includes validating the contents of the file using Lambda, API Gateway, Python & SAM templates (Server Application Model).

Working at Credit Saison India has been a blessing with the challenges thrown at me. It has enabled me to explore newer technologies and also, how to utilise more than the usual stack offered on AWS.

The challenge was to create an endpoint to validate the file content prior sending it to s3 bucket.

We all are familiar with the idea of storing raw files into s3 bucket using API Gateway. Amazon provides a gateway configuration option to include s3 as the service to proxy all requests to. But we cannot use this approach as it won’t be able to validate the file content and instead it will store direct raw files.

you ask, how we did it?

We have used API Gateway to expose an endpoint to upload files, Amazon Lambda to validate the file content, s3 to store the validated files & SAM templates to automate the whole architecture resources.

Cloud Architecture

Implementation

So let’s go step by step on how to achieve this.

**SAM Template** to create RestApi with multipart enabled.

Step 1

Create a REST API Gateway with required resource & method.
Make sure to enable its configuration to accept multipart form file data.
Attach Lambda (that will be created in Step 2) as Integration proxy to the method of the Rest API.

Step 2

Create lambda with policies as AWSLambdaBasicExecutionRole & S3CrudPolicy defined in SAM template. This will allow Lambda to access s3 bucket.

Before writing the python code, first understand how the gateway sends file to lambda?

API Gateway sends the file content to lambda in the “event” parameter of lambda handler in the form of json. Sample event json will look like the following.

{
  "resource":"/file",
  "path":"/report",
  "httpMethod":"POST",
  "headers":{},
  "multiValueHeaders":{},
  "queryStringParameters":"None",
  "multiValueQueryStringParameters":"None",
  "pathParameters":"None",
  "stageVariables":"None",
  "requestContext":{},
  "body":"---------------------------048403183655214093472440\r\nContent-Disposition: form-data; name=\"file\"; filename=\"sample.csv\"\r\nContent-Type: text/csv\r\n\r\nName,Roll No.,Date"
}

“event” json includes all information regarding that file in “body” field as follows.

Content-Disposition, name, filename, Content-Type & actual File Content

Step 3

Parse the “event” json & extract the contents of the file. We’re using requests_toolbelt multipart python library to decode the file contents. Now since you have the file content, write a required logic to validate the content of the file.

Now after validation, we need to send the file to s3 after restoring back all its content. Use the MultipartDecoder to decode the file content and then convert the file content using bytes with ‘utf-8’ format as shown in the code.

The above encoding & decoding of file will make sure the file content won’t gets corrupt over multiple HTTP calls.

Step 4

Create a s3 bucket in the same region as Gateway & Lambda. Add “bucket_name” & “file_path” in your lambda code.

You can use cloudwatch to monitor the response in case there is an error otherwise lambda returns success message & file will pop in s3 bucket.

That‘s all, folks! Thanks for going through the tutorial. Please feel free to drop comments and errata.