Salesforce Apex Optimization: Large Strings vs Heap Size and CPU Time

Pushing the Apex Heap Size almost four times over it’s limit by optimising our code results in some very interesting behaviour…

Justus van den Berg
11 min readMay 30, 2024

Is the Apex Heap Size limit secretly 24MB instead of 6MB? In my quest to find the most efficient way to create very large strings, I have been experimenting and found some interesting behaviour with regards to the heap size.

Clickbait subtitle aside, there probably is a good reason for my findings and I am going to take you through the most efficient way I found to create large strings and the impact it has on the Heap Size and CPU Time.

There often is a bit of a negative vibe around Apex Limits, especially from people who are new to the Salesforce platform. Yet it never fails to surprise me of how incredibly much we can actually do within these limits. Today’s subject is no exception. Let’s dive in!

Large strings

What do I mean when I say large strings? Think of use cases like generating CSV files. I wrote a CSV upload tool to ingest CSV data straight into Data Cloud directly from any Salesforce Org. I boldly claimed that these CSV files created on the platform could get surprisingly large. So let’s take that as our example and dive in a bit deeper.

We have one truly “realistic” limit to deal with: The Blob size. This might come as a surprise, but the blob size has a hard limit of 6MB synchronously and 12 MB Asynchronously. So no matter how big we can make our strings and our heap size, our blob size cannot exceed the 6MB without a Blob size limit exception.
I call this a “realistic” limit as there is no real use case other than files that would require you to create and handle 6MB+ size strings. If there are I would seriously validate if what you’re doing should be on platform.

In order to create files we will need to convert our data to a Blob, so we’re stuck with this limit. Also API request bodies cannot exceed these limits.
But please do correct me if you have a use case as there probably is one out there.

Creating a test string efficiently

Let’s write some Apex (finally)… We start with some synchronous Apex code to generate a long string, something that comes close to 6MB. I am going to use a 32 character test string, add that string to 25 columns over 7500 rows. If we add a comma between each value and a line break at the end of each row. We end up with a Blob size of exactly 6.000.000 bytes.
The 32 characters are easy to calculate with and it seems sort of a realistic average string size for a CSV file. It might be less characters on average, but I think 32 is not a bad start.

A common approach would be concatenating strings using + or the += operator. Something like this:

// Example of how a string generation method is used directly in a method, like getting the size of the blob
System.debug('Blob size: ' + Blob.valueOf(generateConcatStringWith('')).size());

public String generateConcatStringWith(String output){
// 32 Character test string
final String testString = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ01234';

// Generate rows and columns for our CSV, add a return at the end of each row
// If the columnsize is not the last column, add a comma
for(Integer rowI=0; rowI < 7500; rowI++){
for(Integer colI=0; colI < 25; colI++){

// Add the value
output += testString;

// Add a comma, except for the last value
if(colI<24){
output += ',';
}
}
// Add a linebreak at the end

output += '\n';
}

return output;
}

Let’s run it… and the result… Oh oh… “System.LimitException: Apex CPU time limit exceeded” A CPU time out already? I only create a loop with 187,500 iterations. That does not look good.
Let’s look at the debug log:
Maximum CPU time: 15,152 out of 10,000 ******* CLOSE TO LIMIT
Maximum heap size: 3,338,783 out of 6,000,000 ******* CLOSE TO LIMIT

Ok, let’s reduce the number of rows to see what is the threshold to make it work. I found it does not crash at 1750 rows.
Maximum CPU time: 11,767 out of 10,000 ******* CLOSE TO LIMIT
Maximum heap size: 2,800,511 out of 6,000,000

So now we are not crashing, but are still exceeding the the CPU time limit with almost 20%. At this point we haven’t even executed any other logic like queries etc. nor escaped the values as safe CSV strings. Long story short: If we add the String.escapeCsv() method to the logic we can have only 1500 rows before reaching the CPU timeout. Wow, that is pretty rubbish. So how can we make it better?

Instead of concatenating strings we are going to use a list for each concatenated (joined) line and a list for each column value. We join all the column values with a “,” into a single string and then join all the joined lines with a “\n” as a line break character. It will look like this:

// Example of how a string generation method is used directly in a method, like getting the size of the blob
System.debug('Blob size: ' + Blob.valueOf(generateJoinedListString()).size());

public static String generateJoinedListString(){

final String testString = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ01234';

// Create a list to hold all the lines in our CSV
// I add an empty line else the file size is 5.999.999
// It's also easier to debug...
String[] lines = new String[]{''};

for(Integer rowI=0; rowI < 7500; rowI++){

// Create a list that holds all the columns, representing a single line
String[] line = new String[]{};

for(Integer colI=0; colI < 25; colI++){

// Instead of concatenating strings, we add them to a list
line.add(testString.escapeCsv());
}
// Let's add all column values into a list representing a line and join them with a comma
lines.add(String.join(line, ','));
}

// Directly return all the joined columns (lines) and join each full line with a return
return String.join(lines, '\n');
}

Running this code only takes 539ms of CPU time and the heap size is just a 100 bytes. The total heap size of running this entire code snippet is 1163 bytes. So somewhere something is happening in the background.

I don’t want to start stating the obvious yet, but this is an insane difference. With very little effort, we have just created a 6MB string. The output of the of the debug statement “Blob size: 6000000”. So now the truth of why I picked 32 characters as a test value came out…
Now if we add “escapeCsv()” method, the CPU time is 1372. So that method has a big impact. But it is important in this scenario to have safe string values.

To validate my blob limit statement let’s run the code with with 7501 lines.
System.LimitException: BlobValue length exceeds maximum: 6000000”. There it :-) So even though we are miles out for heap size, we are still limited to the maximum size of what a blob can be. But hey, we have a method that generates a 6MB chunk, so now we can have some testing fun.

Let the Interesting Heap Size Behaviour Commence

First of all, let’s see if we can create 6MB strings in a loop and how many loops we can make before we have a CPU time out.
In our last test it took 539ms CPU time and a Heap size of 100 bytes to create a 6MB file. So for easy maths lets start at 10 and see what happens. I would assume 10 * 539 for cpu time and 10 * 100, Give or take should be easy enough to handle.

Three iterations later: “System.LimitException: Apex heap size too large: 12,023,832”. Wait what. How is a single iteration 100 bytes and three iterations 12MB?! I don’t know… I will not give you the answer later; I truly don’t know. On a positive note, the CPU time is 1900ms what is not bad for creating 18MB of data successfully.

Let’s try something: Instead of a loop, I just add the code statement 4 times. I tried that and it has exactly the same heap size and same effect. I set the loop to run 3 times and no errors occur. Beautiful this works, so I have 18MB worth of data and the heap size is showing as 1177/6000000 and the CPU Time as 1735/10000. Something along the lines of WTF come to mind.

Let’s see if I can squeeze out a 4th file by lowering the number of rows.
If I lower the rows to 7100, I can successfully generate 4 strings, each with a size of 5,680,000 bytes. So there seems to be a hidden Heap Size limit that is not visible through any limit classes and it hovers around 22.74MB.

Ok I am confused now. But also delighted! This means we can generate about 22.74 MB in data and still have 80% of our CPU time left. This is important as creating a CSV with 2 loops and synthetic data is nice for size purposes, but in real life you get your data from a different source and that will all take CPU time and Heap space.

To come back to my statement about apex limits never failing to amaze me? This is a lot of performance we have here and we are still in just in synchronous context.

Asynchronous behaviour

Asynchronous limits are twice the size. So for heap size this means we are talking about 12MB and CPU time we have 60 seconds now. I am not to worried about CPU time since we only used about 2 seconds, but I am curious to see if we can make bigger files. So let’s go. I created an async class with a future method, and now upped the rows to 15000k.
I created a run method to keep the logic the same as the anonymous code. Use “Async.run(1, 15000);” to execute the code.

public with sharing class Async {

@future
public static void run(Integer numberOfStrings, Integer numberOfRows){
// Number of strings to create here
for(Integer i =0; i<numberOfStrings;i++){
System.debug('Blob size: ' + Blob.valueOf(generateJoinedListString(numberOfRows)).size());
}
}

// Added numberOfRows variable to make testing easier
public static String generateJoinedListString(Integer numberOfRows){

final String testString = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ01234';

String[] lines = new String[]{''};

for(Integer rowI=0; rowI < numberOfRows; rowI++){

String[] line = new String[]{};

for(Integer colI=0; colI < 25; colI++){
line.add(testString);
}
lines.add(String.join(line, ','));
}

return String.join(lines, '\n');
}
}

Let’s run with a single record to begin with and see if we can make a 12MB file here. “Apex heap size too large: 24,046,320” Ok, that is twice as much as the limit. Let’s lower the number of rows and find the threshold for the async string size. Async we can have 14,860 records with a size of 11,888,000 bytes and a maximum of 2 files. So that is 23,77MB worth of data, divided over 2 strings.

Quick tip: These future methods can take a long time to run, a quick way to check the status is using the below query:

SELECT CreatedDate, Status, ExtendedStatus 
FROM AsyncApexJob
WHERE MethodName = 'run' AND JobType = 'future'
ORDER BY CreatedDate DESC

Let’s try 3 files now and see where it brings us. Three files end up being 5,94MB per file, probably because it is uneven. For the final test we check 4 files… But we don’t really have to test it… The threshold for 4 files async is 7485 rows or 5,988,000 bytes per string or 23.95MB in total.
This is the absolute largest string size I was able to generate.

Sync vs Async Conclusion

It turns out there is no difference between the sync and async heap size at all. It seems we have a magical heap size limit of 24MB that is the same for both synchronous and asynchronous Apex executions.
There is however a difference in Blob size limits: Async we can have a Blob size up to 12MB. Even though this is twice the size, we can still only have 2 of these files before we hit the 24MB limit.

In both sync and async we can have a total of 24 MB worth of blobs divided over 4 files or more smaller files if you wish. So I can only conclude that here is no real difference in the heap size limit and the heap size limit is actually larger than advertised.

I am probably missing something here, but I’d love to hear your thoughts on why the heap sizing works like this. In the end, I might be all wrong but I am happy I found out how these limits work actually work and what to think about when I work with large files.

Real life scenarios

Generating large strings with a few loops and leveraging the “String.join()” methods works really well in straight forward scenarios like handling query data or structured web service responses.
The reality however is often not as simple and you will need extra calculations or some sort or selective logic. This means that you require to add data to your string in multiple steps through multiple methods.

A good way to achieve this is to create an orchestration method that takes an empty string list as a parameter. The orchestration method then handles the concatenation of all the strings that are collected through various methods and these methods can take the same string list as a parameter and update this list as they go.
The orchestration method can return the String.join() method in the end.

Preventing large strings as class variables and declare them at method level is a heap size best practice as per the documentation.

For example:

// I call the orchestration method here 4 times for testing purposes
// Note that we give a new String list as an parameter to prevent any
// storage at class level.
for(Integer i =0; i<4;i++){
System.debug('Blob size: ' +
Blob.valueOf(
orchestrationMethod(new String[]{})
).size()
);
}


// The orchestration method collects the data from multiple methods
// In this example I create the CSV using multiple methods and the
// orchestration method joins all the lines together
public static String orchestrationMethod(String[] lines){
generateJoinedListString(lines);
generateJoinedListString(lines);
generateJoinedListString(lines);
generateJoinedListString(lines);
return String.join(lines,'\n');
}

// Instead of creating a new list, we take the list from the orchestration
// and populate that. Note because we run this 4 times in the orchestration
// method I lowered the size to 1500 rows instead of 7500.
public static void generateJoinedListString(String[] lines){

final String testString = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ01234';

for(Integer rowI=0; rowI < 1500; rowI++){

String[] line = new String[]{};

for(Integer colI=0; colI < 25; colI++){
line.add(testString);
}
lines.add(String.join(line, ','));
}
}

Method of measurement

To calculate the execution times and the heap sizes I have logged the start and end values of the CPU time and Heap Size before each iteration and summed up all the values to calculate the total at the end. I then output the result to the debug logs. The results do very occasionally, but in this gist they result are pretty consistent.

You can get the full Anonymous Apex Code gist here: https://gist.github.com/jfwberg/08ae86748889db940f4a2cd8d269477c

The output looks like this:

Debug details output in the debug log
Aggregation of the heap size at the end of the debug logs

Conclusion

  • String concatenation with + and += is really slow at large scale
  • String lists are mega fast and the String.join() method works really as well on large lists
  • Heap Size is not black and white nor limited at 6MB or 12MB. Something is happening behind the scenes that both the Limits class and debug logs do not seem to pick up on. Whether that is max of 6MB at a single time and the heap counter restarts when variables are cleared, there still seems to be a top maximum around the 24MB you simply can’t get past.
  • If you write your code well and follow the best practices, you can build great things well within the Apex Governor Limits. And if you can’t, you should probably check if what you’re doing should be on platform…
  • Each piece code is different, this here is just a synthetic test and not truly representative for real-life scenarios, but it should give a good idea of how you can implement large strings properly.

Final Note

At the time of writing I am a Salesforce employee, the above article describes my personal views and techniques only. They are in no way, shape or form official advice. It’s purely informative.
Anything from the article is not per definition the view of Salesforce as an Organization. Always speak to a certified implementation partner before implementing anything.

--

--