Stop paying for objects that you do not see

Two Lifecycle Policies Every S3 Bucket Should Have

Abandoned incomplete multipart uploads and expired current delete markers: what are they, and why you must care about them thanks to bad AWS defaults.

Jonathan Merlevede
datamindedbe

--

According to your AWS bill, this seemingly empty bucket might, in fact, be quite full.

There may be items in your buckets that you do not see but that adversely impact S3 costs and performance. This post explains what these invisible objects are and what you can do to remove them, as, for reasons of backward compatibility and possibly also because of perverse incentives, AWS does not remove them for you by default.

TL;DR version: The objects in question are parts of abandoned multipart uploads and expired object deletion markers. I think every bucket should have a lifecycle policy to remove them. If you do not know what these objects are and are interested in knowing, read on.

Aborted, incomplete multipart uploads

What are multipart uploads?

Uploading small objects to AWS S3 is possible using just a singlePutObject operation. For larger objects, we use multipart uploads, and the flow is more involved:

  • Perform theCreateMultiPartUpload operation. You specify an object key; AWS returns you an upload ID.
  • Perform UploadPart operations, one for every part. You present AWS with the object key, upload ID, a “part number”, and part of the file you want to upload.
  • Perform the CompleteMultiPartUpload operation. You specify the object key and upload ID; AWS then creates your object as the concatenation of all the parts you uploaded.

All of this is usually handled by your upload tool or library. For example, if you use the AWS CLI to upload files (using aws s3 cp), it will use multipart uploads by default for files larger than 8MiB.

What is problematic about multipart uploads?

If you never complete an upload, associated uploaded parts remain in your bucket forever. These parts are stored in your bucket; you pay to keep them while you do not see them.

To illustrate this, let’s create a 5GiB test file and commence uploading it to the cloud using aws s3 cp:

bucket=yourbucketname
key=tmp/testfile
dd if=/dev/urandom of=/tmp/testfile bs=1G count=5
aws s3 cp /tmp/testfile "s3://$bucket/$key" &
[1] 71717

After leaving the upload running for a while, kill the upload abruptly:

kill -9 71717

You can see that the multipart upload still exists:

aws s3api list-multipart-uploads --bucket $bucket
{
"Uploads": [
{
"UploadId": "gB8iBxnOiladG...",
"Key": "tmp/testfile",
"Initiated": "2023-11-17T14:31:24+00:00",
"StorageClass": "STANDARD",
"Owner": {...},
"Initiator": {...}
}
],
"RequestCharged": null
}

You can list associated parts using list-parts :

id="gB8iBxnOiladG..."
aws s3api list-parts --bucket "$bucket" --key "$key" --upload-id "$id"
{
"Parts": [
{
"PartNumber": 1,
"LastModified": "2023-11-17T14:39:17+00:00",
"ETag": "\"edeb946bf1303ed6350887c519041350\"",
"Size": 8388608
},
{
"PartNumber": 2,
"LastModified": "2023-11-17T14:39:17+00:00",
"ETag": "\"5c288b397c12981b40f19fde2387db5d\"",
"Size": 8388608
},
...
],
...
}

What can I do about abandoned multipart uploads?

If you know that an upload is abandoned and should be aborted, you can remove uploaded parts by aborting the upload using the AbortMultipartUload operation:

aws s3api abort-multipart-upload --bucket "$bucket" --key "$key" --upload-id "$id"

You can verify that the upload ID, and therefore also associated parts, are removed:

aws s3api list-multipart-uploads --bucket $bucket
{
"RequestCharged": null
}
aws s3api list-parts --bucket "$bucket" --key "$key" --upload-id "$id"         
An error occurred (NoSuchUpload) when calling the ListParts operation: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.

What can I really do about abandoned multipart uploads?

The above is highly impractical, as it requires you to monitor your bucket for abandoned uploads and manually abort them. Instead, tell AWS to abort multipart uploads automatically after a certain period expires using an (expiration) object lifecycle rule.

You can do this using the CLI or IaC tools like Terraform, but also from the AWS Console. While you’re there, tick the box “Delete expired object delete markers” too; we will explain what this does in the next section.

Expiration lifecycle policy expressing what should really be the AWS defaults.

Are there any disadvantages to this?

Object expiration through lifecycle policies is free.

The only thing to account for is that the number of days you set in the policy limits how long a multipart upload can take. Keep this in mind if you are uploading 5TiB objects from a slow connection. Choose something ridiculously long for a single upload but not so long that it impacts your storage bill, like 14 days.

Expired delete markers

Other invisible objects that tend to linger around in buckets are called “expired object delete markers.” These objects only exist in versioned buckets.

What are object delete markers?

When you enable bucket versioning, every prefix becomes associated with a stack of versioned items. The most recent item is the “latest” or “current” one. There are two types of versioned items: object versions and delete markers. Writing data to a location pushes an object version onto the stack; deleting an object pushes a delete marker. If the current item is an object version, this object version is visible as an object in your bucket; current delete markers remain invisible.

We can easily illustrate this using a couple of aws CLI commands. Before we upload an object, no version exists:

bucket=yourbucketname
key=tmp/testfile
aws s3 ls "s3://$bucket/$key" # verify that no object exists
aws s3api list-object-versions --bucket "$bucket" --prefix "$key"
{
"RequestCharged": null
}

After uploading an object, we see the object and a single object version:

dd if=/dev/urandom bs=1M count=2 | aws s3 cp - s3://$bucket/$key
aws s3 ls s3://$bucket/$key
2023-11-21 00:33:09    2097152 testfile
aws s3api list-object-versions --bucket "$bucket" --prefix "$key"
{
"Versions": [
{
"ETag": "\"18f78ecf07a0c41e8ec2defa200a5029\"",
"Size": 2097152,
"StorageClass": "STANDARD",
"Key": "tmp/testfile",
"VersionId": "bUWdb14EppQR13KLVapM699xySNlo9yR",
"IsLatest": true,
"LastModified": "2023-11-20T23:33:09+00:00",
"Owner": {...}
}
],
"RequestCharged": null
}

Deletion of the file adds a deletion marker:

aws s3 rm s3://$bucket/$key
aws s3api list-object-versions --bucket "$bucket" --prefix "$key"
{
"Versions": [
{
"ETag": "\"18f78ecf07a0c41e8ec2defa200a5029\"",
"Size": 2097152,
"StorageClass": "STANDARD",
"Key": "tmp/testfile",
"VersionId": "bUWdb14EppQR13KLVapM699xySNlo9yR",
"IsLatest": false,
"LastModified": "2023-11-20T23:33:09+00:00",
"Owner": {}
}
],
"DeleteMarkers": [
{
"Owner": {},
"Key": "tmp/testfile",
"VersionId": "kNig7WIYhADWCr47u_nRrQ8QYdeW4eIj",
"IsLatest": true,
"LastModified": "2023-11-20T23:34:59+00:00"
}
],
"RequestCharged": null
}

Listing objects using aws s3 ls returns an empty result. Removal of a delete marker would restore the object. You could do this as follows (but let’s not for now):

vid="kNig7WIYhADWCr47u_nRrQ8QYdeW4eIj"
aws s3api delete-object --bucket "$bucket" --key "$key" --version-id "$vid"

What are current and noncurrent object delete markers?

If a delete marker is the latest or current item on the version stack, we refer to it as a current object delete marker. Otherwise, we refer to it as a noncurrent object delete marker.

In the example above, the single delete marker at the prefix tmp/testfile is a current delete marker. Uploading another object to the same location creates a new object version:

dd if=/dev/urandom bs=1M count=2 | aws s3 cp - s3://$bucket/$key
aws s3api list-object-versions --bucket "$bucket" --prefix "$key"
{
"Versions": [
{
"Key": "tmp/testfile",
"VersionId": "l3_yEX5pQZCloovHVMsbIgbzP1pqZ4iU",
"IsLatest": true,
...
},
{
"Key": "tmp/testfile",
"VersionId": "bUWdb14EppQR13KLVapM699xySNlo9yR",
"IsLatest": false,
...
}
],
"DeleteMarkers": [
{
"Key": "tmp/testfile",
"VersionId": "kNig7WIYhADWCr47u_nRrQ8QYdeW4eIj",
"IsLatest": false,
...
}
],
"RequestCharged": null
}

At this point, the delete marker with version ID kNig7WIYhADWCr47u_nRrQ8QYdeW4eIj is still there but has become “noncurrent”, as indicated by its property IsLatest with value false.

What are expired object delete markers?

Expired object delete markers are delete markers at a prefix with no noncurrent object versions.

We can turn the delete marker from our example into an expired object delete marker by removing the object versions at the same location:

aws s3api delete-object --bucket $bucket --key $key \
--version-id l3_yEX5pQZCloovHVMsbIgbzP1pqZ4iU
aws s3api delete-object --bucket $bucket --key $key \
--version-id bUWdb14EppQR13KLVapM699xySNlo9yR
aws s3api list-object-versions --bucket "$bucket" --prefix "$key"
{
"DeleteMarkers": [
{
"Key": "tmp/testfile",
"VersionId": "kNig7WIYhADWCr47u_nRrQ8QYdeW4eIj",
"IsLatest": true,
...
}
],
"RequestCharged": null
}

The delete marker with version IDkNig7WIYhADWCr47u_nRrQ8QYdeW4eIj is now an expired object delete marker.

Why are expired object delete markers bad?

The problem is that expired object delete markers can be current and can remain in your bucket forever unless you do something about them. Lingering markers can slow list requests and result in redundant results when listing object versions.

When enabling bucket versioning, you typically implement an expiration policy for noncurrent items, as not doing so often quickly becomes prohibitively expensive. With such a policy, all delete markers eventually become current, expired ones. Your expiration policy will not remove these from your bucket. In our example, the delete marker at tmp/testfile will never be automatically removed by a policy expiring noncurrent versions.

How can I remove expired object delete markers?

You can remove delete markers manually:

id=kNig7WIYhADWCr47u_nRrQ8QYdeW4eIj
aws s3api delete-object --bucket $bucket --key $key --version-id $id

How can I really remove expired object delete markers?

As for multipart uploads, the best way to remove expired object delete markers is through an explicit (expiration) lifecycle policy targeting expired object delete markers. One way to do this is by ticking a box in the Console (see the screenshot above). (You should probably use an IaC tool, though.)

Are there any disadvantages to this?

In the improbable event that you would like to have an account of which objects existed in the past and when they were removed but do not need the ability to restore said objects, consider not removing expired object delete markers.

As for aborted multipart uploads, removing delete markers through lifecycle policies is free. Delete markers only exist in versioned buckets, but having a policy to remove them is never harmful.

Conclusion

We have seen what incomplete multipart uploads are and why you should abort the abandoned ones. We have also seen what expired object delete markers are, that they can be current and that you should remove them.

AWS does not abort or remove anything for you by default. Therefore, whenever you create a bucket, create an expiration lifecycle policy that aborts multipart uploads after some days and removes expired object delete markers.

  • 👏 If you liked this article, don’t forget to clap
  • 🗣️ Share your insights in the comments; I’ll try to respond
  • 🗞️ Follow me and subscribe to datamindedbe for more posts on cloud, platform-, data-, and software engineering
  • 👀 For more about Data Minded, visit our website

--

--