Migrating Salesforce Files to S3 with AWS Lambda

Evan Koch
Salesforce Architects
3 min readApr 24, 2024

--

File Migration

One of our clients recently shared their concerns about their Salesforce file storage usage and wanted us to design and implement a solution to automate the migration of files from Salesforce to Amazon S3. While an opportunity was active, the related files needed to be stored in the standard ContentDocument/ContentVersion format to ensure interoperability with a managed packaged. After the opportunity was closed, however, those files could be moved to S3 for long-term archival, and some of those files exceeded 12.5 MB in size.

As I evaluated options executing the migration of the files, I came across AWS Lambda functions. And since the client was already using AWS S3 and the Satrang Drag, Drop & Upload Files to Amazon S3 managed package, it felt like a natural solution. The diagram below shows at a high level the two platforms involved communicating through REST API.

Diagram shows product and technology involved in process of migrating Salesforce files to AWS S3
Image 1 — Level 2 Diagram

The solution can be distilled into a few steps:

  • Initiate the process and tell the AWS lambda function which file to migrate.
  • Download the Salesforce file and create the S3 object.
  • Create the CloudDocument record in Salesforce and delete the original ContentDocument/ContentVersion record.

Now, let’s break the steps down into further detail.

The first step can be managed easily enough — when the Opportunity reaches a Closed stage, retrieve the ID of all ContentVersion records associated with the Opportunity, turn them into JSON messages, and post them to the Function URL for the Lambda function. I also suggest specifying an APIKey value in the request header to provide security.

Diagram shows connection between Salesforce and AWS for transmitting the JSON message to initiate the mitigation of Salesforce files to S3
Image 2 — Level 3 Diagram

Once the lambda function receives the ContentVersionID, it’s necessary to authenticate with Salesforce and download the file.

Diagram shows file retrieval and pickup
Image 3 — Level 3 Diagram

Below is a code block demonstrating the JSforce and Axios libraries.

 var conn = new jsforce.Connection({
loginUrl : loginURL
});

await conn.login(username, password + securityToken, function(err, userInfo) {
if (err) { return console.error(err); }
});

var accessToken = conn.accessToken;

var fullRestURL = RESTApi + '/services/data/v47.0/sobjects/ContentVersion/' + contentVersionID+ '/VersionData';

const { data } = await axios.get(
fullRestURL,
{
headers: { Authorization: 'Bearer ' + accessToken },
responseType: 'arraybuffer',
},
{
timeout: 30
}
);

const input = {
Body: data, //bytes that we just downloaded from Salesforce with Axios
Bucket: bucket,
Key: s3FileName,
ContentType: contentType
};

const command = new PutObjectCommand(input);
const client = new S3Client({ });
await client.send(command);
console.log('S3 complete');

After retrieving the contents of the file and storing it in S3, there’s still cleanup to do in Salesforce. I opted to go with a platform event for this, as it was not a scenario where I needed to keep history. I used the JSforce library to create a platform event record with the details of the S3 object and its relationships (OpportunityID, ContentVersionID, S3 Bucket, AWS Regions, and the full file name), which invokes a platform event-triggered flow to manage the final cleanup.

When Salesforce receives the platform event, a flow is triggered that retrieves the details of the original ContentVersion record, creates the new CloudDocument record that points to the S3 object, and deletes the ContentVersion record, reducing the amount of Salesforce file storage consumed and eliminating the manual process of moving the files to S3.

While this use case focuses on S3, this is a pattern that can be used in other scenarios when you need to process large amounts of data without limits.

--

--