Signing Multipart Uploads to S3 Buckets from Scratch

Workday Technology
Workday Technology
Published in
6 min readFeb 1, 2018

By Marcelo Costa, DevOps Engineer, Workday

Amazon Web Services (AWS) provides Software Development Kits (SDKs) for many popular languages, which facilitates the usage of the AWS API. However, there are cases where it is not possible to leverage such command line utilities or libraries, for example: environment restrictions may forbid installing such software. In such cases, the HTTP requests will need to be tailored from scratch. This article shows how to do this for multipart uploads of large files to Amazon Simple Storage Service (S3), focusing on how to sign requests using the AWS Version 4 signature.

These guidelines are based on the need to provide a simple mechanism, free of any external dependencies. The example illustrates how to upload large files to an S3 bucket via an SSH tunnel and a SOCKS proxy with server-side encryption enabled. This has been implemented in an environment where installing software such as AWS SDKs was not permitted. Hence, the decision to do this via command line tools and Bash scripting.

A simple script that only interacts with the S3 service supports the AWS version 2 (AWS V2) signature without problems. However, as the script was going to interact with both services: S3 and KMS (Key Management Services), AWS reinforces the usage of the AWS Signature V4 for any requests directed to the KMS service.

A Brief Overview of the AWS Signature Version 4 (AWS4) Authentication Model

A new signature method was made available by Amazon to improve security on the HTTP calls that target AWS services. The new model requires multiple pieces of data, such as: the content of the HTTP request (examples: URI, query parameters, headers). The model also requires the secret key from the Identity and Access Management (IAM) user, the date, and other values. A SHA-256 hash must be generated from the request data and its headers, then they are hashed again and signed with a signing key that is based on the user’s access key. The result of these rounds of encoding is passed on to the request within the “Authorization” HTTP header (which contains the entire “AWS4-HMAC-SHA256” content), along with the user’s secret key.

It is easy enough to abstract the actual steps around the signing process and use some existing code as reference to implement your own version of the signing function. However, it is important to understand that the contents of the request will be encrypted again on the server-side once the request arrives on the AWS servers. If the content of the signature does not match the actual content of the request, if it is missing one of the mandatory headers, or if they are not declared in the correct sequence, even if it is a single whitespace character or any typo of any kind, the request will fail and return this error:

<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>…

Due to the nature of the AWS4 signature method, it can be challenging to determine what exactly went wrong with your request. The comparison between the “hashed” content that has been sent and the other hash, that is calculated again on the server side based on the contents of the request, cannot provide any details regarding where exactly is the mismatch.

This section provides some general guidelines to construct a well-formed HTTP request that can prevent issues with the AWS4 signature.

The code snippets presented in this section leverage the “assemble_aws_v4_signature” shell scripting function, which is available on Gist. The implementation was adapted from the “Examples of the Complete Version 4 Signing Process (Python)” page, which is part of the official AWS documentation.

Interacting with AWS services through the HTTP API

These operations were included in the Bash script. Each operation has a specific sets of headers according to the use cases involved in the overall objective:

  • Upload a small file (up to 5GB) with the KMS SSE enabled.
  • Upload a large file with KMS SSE enabled. You’ll need to break this down into 3 sub-tasks (using the multipart upload process): Initiate multipart upload by interacting with the S3 Web Service with AWSV4; Upload all the parts of the large file; Complete the multipart upload.

In order to mitigate any issues that can result in a “SignatureDoesNotMatch” error, make sure you have a clear separation of 3 concepts:

  1. The payload (In this case, this will be the file that you intend to upload).
  2. The request parameters (The query string).
  3. The headers (The http headers, e.g., X-Amz-Date)

Whatever is sent through the cURL must be signed using the AWS4-HMAC-SHA256 method.

Here are a few examples illustrating some of these operations along with some caveats involving the required data and its formatting.

Example 1 — aws s3api put-object

host=”s3-${REGION}.amazonaws.com”uri=”/$BUCKET/$REMOTE_FILE”endpoint=”https://${host}${uri}"assemble_aws_v4_signature “PUT” “s3” “us-west-2” “$host” “$uri” “” $FILE “$the_kms_key”# S3 curl — Uploading small file — URI must specify /BUCKET/REMOTE_FILEcurl -vv -X PUT $OPTIONAL -T “$FILE” -H “Host: $host” -H “x-amz-content-sha256: $hashed_payload” -H “X-Amz-Date: $x_amz_date_long” -H “x-amz-server-side-encryption: aws:kms” -H “x-amz-server-side-encryption-aws-kms-key-id: $the_kms_key” -H “Authorization: AWS4-HMAC-SHA256 Credential=$AWSID/$x_amz_date_short/$region/$service/aws4_request, SignedHeaders=$signed_headers, Signature=$signature” “$endpoint”

Any interaction with the S3 service requires the “x-amz-content-sha256” header. The value of this header is the sha256 hash of the payload (which is the file that is being uploaded). The cURL handles that with the “ -T “ parameter (-T, — upload-file <file> This transfers the specified local file to the remote URL.).

Also the sequence of the SSE (Server-Side-Encryption) headers must follow this order: x-amz-server-side-encryption and x-amz-server-side-encryption-aws-kms-key-id.

We recommend that you check the examples in the AWS documentation for additional guidance: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html#put-object-sse-specific-request-headers

Example 2 — aws kms list-aliases

host=”kms.${REGION}.amazonaws.com”uri=”/”endpoint=”https://${host}${uri}"assemble_aws_v4_signature “POST” “kms” “us-west-2” “$host” “$uri” “” “{}” “”# KMS curl — Requires content-type and x-amz-target:TrentService.ListAliasescurl -X POST $OPTIONAL -d “{}” -H “Content-Type: application/x-amz-json-1.1” -H “Host: $host” -H “X-Amz-Target: TrentService.ListAliases” -H “X-Amz-Date: $x_amz_date_long” -H “Authorization: AWS4-HMAC-SHA256 Credential=$AWSID/$x_amz_date_short/$region/$service/aws4_request, SignedHeaders=$signed_headers, Signature=$signature” “$endpoint”

In this request, it is important to make sure that the payload ( {} ) is processed correctly by the function and its resulting sha256 hash is placed in the LAST line of the “canonical_requests” string. Also, the cURL must specify the “-d “ parameter passing the exact same payload. You may want to follow the pattern described in the documentation: http://docs.aws.amazon.com/kms/latest/APIReference/API_ListAliases.html#API_ListAliases_Examples

The KMS service doesn’t require the usage of the “x-amz-content-sha256” header.

Example 3 — aws s3api create-multipart-upload

host=”s3-${REGION}.amazonaws.com”uri=”/$BUCKET/$REMOTE_FILE”endpoint=”https://${host}${uri}?uploads"assemble_aws_v4_signature “POST” “s3” “us-west-2” “$host” “$uri” “uploads=” “” “$the_kms_key”# S3 curl — Initiating multi-part upload — URI must specify /BUCKET/REMOTE_FILE?uploadscurl -X POST $OPTIONAL -H “Host: $host” -H “x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855” -H “X-Amz-Date: $x_amz_date_long” -H “x-amz-meta-md5: $amz_meta_md5” -H “x-amz-server-side-encryption: aws:kms” -H “x-amz-server-side-encryption-aws-kms-key-id: $the_kms_key” -H “Authorization: AWS4-HMAC-SHA256 Credential=$AWSID/$x_amz_date_short/$region/$service/aws4_request, SignedHeaders=$signed_headers, Signature=$signature” “$endpoint” > /tmp/${REMOTE_FILE}.xml

Note that the query string here requires the “equals” symbol: “uploads=” and the endpoint has the full URL. Just to make it explicit: If the “uploads=” fragment is not passed to the “query_string” variable and it is appended on the “canonical_uri” variable instead, it will not work. Note that the payload is empty (7th argument) but we still pass the md5 hash (encoded in base64), so the function calculates that out of the $FILE variable (that is exposed globally in the script).

Example 4 — aws s3api upload-part

# using same host and uriendpoint=”https://${host}${uri}?partNumber=$part&uploadId=$UploadId"assemble_aws_v4_signature “PUT” “s3” “us-west-2” “$host” “$uri” “partNumber=$part&uploadId=$UploadId” “$file_part” “”# S3 curl — Uploading a part in a multipart upload — URI must specify /BUCKET/REMOTE_FILE?partNumber=$part&uploadId=$UploadIdcurl -vv -X PUT $OPTIONAL -D /tmp/$REMOTE_FILE.$part.head -T “$file_part” -H “X-Amz-Date: $x_amz_date_long” -H “X-Amz-Content-SHA256: $hashed_payload” -H “Host: $host” -H “Authorization: AWS4-HMAC-SHA256 Credential=$AWSID/$x_amz_date_short/$region/$service/aws4_request, SignedHeaders=$signed_headers, Signature=$signature” “$endpoint”

This example presents a clear separation of the 3 concepts mentioned previously:

If you need to interact with AWS services purely through HTTP requests, then these guidelines and examples will hopefully be of help. These instructions should also give you substantial insight to address bugs and make your troubleshooting efforts more effective.

--

--