How to Scale Test on Salesforce

Salesforce Architects
Salesforce Architects
17 min readAug 24, 2021

--

Image of container boxes on a train.

No battle was ever won according to plan, but no battle was ever won without one.

- Dwight D. Eisenhower

When you build a Salesforce application, you want to know that it is scalable. The alternative — letting your users discover scalability issues in production — is a recipe for losing trust and potentially revenue.

Consider a scenario in which you have architected an application, completed performance testing, and deployed it. Everything runs smoothly, at least for a while. Then something happens — a merger, a marketing campaign, a large-scale reorganization — and usage of your application increases exponentially. Suddenly, the classic signs of scalability issues are apparent: the application slows to a crawl, you start seeing concurrency errors, and governance limits are breached consistently.

Clearly, this is not a good situation. Situations like these, however, can be avoided with effective scalability testing.

It’s important to understand that performance testing is not the same as scale testing. Performance testing is primarily about identifying how efficient and responsive an application is for a certain number of users; on Salesforce this means verifying that the application has reasonably efficient SOQL, efficient web pages, and optimized Apex code for a limited number of concurrent users. Scale testing, in contrast, is about identifying platform limits and application implementation limits so that steps can be taken to proactively remediate any issues that would result from predetermined increases in user traffic, data volume, or business transactions.

When to scale test

Performance testing is a must for all implementations; scale testing is always a good idea, but it is essential when you are seeing or expect to see:

  • Seasonal demand that creates traffic spikes (including holiday seasons and tax seasons, for example)
  • Rapid expansion of the business, such as an acquisition that will result in increased throughput
  • New business functionality that significantly increases traffic
  • New Salesforce implementations for which throughput is expected to be large
  • New marketing campaigns that are likely to drive spikes in demand

The scale testing lifecycle

Diagram detailing the five phases of scale testing.

The scale testing lifecycle has five phases focused on planning, workload creation, workload execution, workload analysis, and reporting.

1. Scale test planning

Proper planning of you scale test journey is crucial and can help you avoid ambiguous and inconsistent test results.

Test plan creation
Time spent in planning can reduce false positives — results that suggest there is an issue when in reality there is none — that may arise during scale testing. Here are some important objectives for your scale test plan:

  • Define your testing goals: Get a sign-off from business stakeholders on the testing use case and the detailed business goals for the test. Pick a repeatable and critical use case for conducting your scalability tests throughout the application lifecycle.
  • Define a concurrency model and throughput requirements. Throughput, in this context, means the number of transactions that take place in a given interval of time per user. In Salesforce, these transactions could be saves or loads of particular entities, or an interaction between a user (or an API) and Salesforce. One way to measure throughput is to count the total number of XMLHttpRequests (XHR) as Salesforce transactions for the end-to-end business use case. The Salesforce throughput would be X/T per user where X is the total number of XHRs and T is the total time. In the case of API calls, throughput calculations are fairly straightforward. Consider, for example, an application generating 100 cases per hour per user. The business throughput here is simply 100 cases/hour per user. If the end-to-end business process involves 10 XHRs and 2 API transactions each hour, the Salesforce throughput will be 10 +2 = 12 transactions per hour per user. The concurrency and throughput model you define with these calculations will help in defining success criteria for scale testing and can be used to notify Salesforce Customer Support of your test plans.
  • Note on measuring XHRs. You can see the number of XHRs for a webpage in Chrome by selecting Inspect, and then setting the XHR filter option on the Network tab as shown here. Alternatively, you can use the Salesforce Community Optimizer Tool, which shows XHRs associated on its Insights tab. XHRs can also be calculated by inspecting event logs after executing an end-to-end business scenario for a single user.
Screenshot showing the Network tab in Chrome developer tools to measure XHRs.
  • Define scalability criteria and end goals. To define scalability criteria based on business requirements for a peak season or peak hour, you first need to determine the current throughput for the organization as it is deployed. You then use this as a baseline for the scaled workload. Consider, for example, an organization that averages 1,000 cases/hour per user normally, but during peak season is expected to average 10,000 cases/hour per user. In this case, you’ll need to test with 10X the normal throughput, and thus the success criteria of scale tests are that they be able to generate this level of throughput.

Salesforce readiness and metadata creation
In preparation for scale testing, use the following list to ensure readiness:

  • Create a Full sandbox with the same release as you have in production.
  • Confirm the metadata is refreshed with production sharing and visibility settings.
  • Disable debug logs on the application while running tests.
  • Check that the application server’s version is consistent with the one in production (this is recommended, but not essential)
  • Notify Salesforce Customer Support at least two weeks in advance of your testing data. For details required in submitting this request use the guidelines described in this post.

Licensing and provisioning

As part of your test plan, consider licensing and provisioning implications.

  • Make a list of tools needed to run your scale tests. Depending on the use case, you’ll want to double-check that the appropriate licenses and infrastructure for running those tools are in place.
  • Browser based testing tools such as Selenium and TruClient are good options as they simulate actual user behavior in a browser, which consumes CPU and memory just as they would for a production use case.
  • A less costly alternative is open source Apache JMeter software. With easy scripting, JMeter works well for scale testing, but less so for Salesforce application performance testing, since it has limited support for processing JavaScript or Ajax, which may affect the accuracy of your results. If you are simulating a Lightning Experience based workload using JMeter make sure to analyze and compare your workloads to ensure parity between your actual production load and the corresponding JMeter workload (Eagle Eyes/Salesforce Customer Support can assist in providing this information).
  • Depending upon your load generation machine’s availability and testing infrastructure, a mix of browser-based and throughput-based tools (to directly invoke XHRs or API calls) can also be used.
  • Your team will need to learn how to write and execute load scripts for your chosen tools unless they already have that knowledge.
  • Take time to build and configure your testing environment (including load testing hardware) in advance.

2. Scale workload creation

Workload creation spans the creation of synthetic data, design of test cases, and development of load scripts.

Synthetic data creation
Ideally, you would use production data for scale testing. Since this is often not practical, you’ll likely need to generate synthetic data using tools, such as Snowfakery. As you create your synthetic data, pay close attention to both data quality and quantity to ensure effective scale testing.

Test case design
When designing test cases for scale testing, focus on those use cases that will be executed during periods of peak usage for each persona. Here are some important considerations to take into account as you design:

  • Application flow. Ensure all automations are present and enabled when tests run. Make sure the test scenarios cover all the major application use cases and exercise all customizations in the application as well as all major code paths.
  • Integration traffic. Does the application rely on API integration? If so, be sure to create the necessary stubs to mock API responses so that all such integrations are captured as part of your workload simulation.
  • Reports and dashboards: Determine if reports or dashboards will be part of the test case. Include any reports or dashboards that are a high priority for the business. Identify the maximum concurrency of the major reports that are part of workload and when the reports will be run.

Load script creation
Once the test case has been designed, the next step is to create scripts to generate the load. This is typically done using tools that record and later play back user actions. During script creation considered the following:

  • Think time. Scripts should not play back actions as fast as possible, but rather at a speed that mimics the think time that a user would typically spend as they work through the use case. Otherwise, the artificially accelerated pace of the test case may generate false positives.
  • Pacing. Similarly, the load script should produce a load that is in line with the throughput (transactions per unit time per user) defined in the test plan.
  • Parameterization. Parameterization is a technique used in developing load scripts to improve test quality by varying the values (for example, values for user IDs, passwords, URLs, account IDs, and so on) that are sent from the client to the server.
  • Correlation. Correlation is a technique used in developing load scripts to capture dynamic values such as the session ID or entity ID that are returned in server responses. After capturing these values, the load script passes them as parameters in subsequent requests.
  • Logins. Ensure the script does not log in and log out for each transaction to avoid an unrealistically high number of logins. Perform the use case multiple times before logging out or use an existing valid session.
  • Errors. Ensure that script execution does not produce any unwanted errors.

After creating your load script, make a table (in a spreadsheet, for example) of all the test case steps for every scenario that is part of the load. Use this table to track issues at different steps during script execution.

Image depicting example test case steps for a scale test script.

3. Scale workload execution

After the workload has been created, the focus shifts to execution.

Single-user assessments
The first step in workload execution is performing an optimized single-user transaction to establish a baseline. The objective is to measure (manually, if necessary) the application’s responsiveness to end user actions. Only after this is done should you move on to multi-user tests to identify scale related issues. You can use Selenium or a similar tool to measure response time — for example, the duration between the user entering a URL in their web browser and the page being fully displayed.

It’s a good idea to run single-user tests for at least 10 iterations and verify that all transactions are within expected limits, for example by identifying 90th percentile response times. If the median value of all response times does not meet your SLA requirements, then take the time to perform a root cause analysis by breaking the response time down into its server time, client time, and network time components. The Salesforce Community Optimizer tool or Salesforce support teams can assist with this. Once you determine which layer is consuming the largest percentage of the response time, perform tuning to bring the overall response time within required limits.

A screenshot of Salesforce Community Optimizer showing XHR response times for the Case Management application.
XMLHttpRequest (XRH) example output.

In the Salesforce Community Optimizer analysis of XHR response times shown here, you can see that:

  • XHR response time totaled 569 ms
  • Out of that 569 ms, 180 ms was network time
  • Total server time is 389 ms (569–180). Out of the 389 ms, two actions (getComponent and getMetadataInitialLoad) together took 292 ms, and out of the 292ms, 43 ms is database time.
  • The remaining time, 97 ms or the difference between the server time and the sum of action times, is due to application server processing tasks such as serialization and deserialization.

You can then use the Actions tab of Salesforce Community Optimizer to identify those Actions that require the most processing and then take appropriate steps to remediate the issues.

A screenshot of Salesforce Community Optimizer showing individual Actions for the Case Management application

Make it a point to remedy all issues identified during single-user assessments before moving on to multi-user testing.

Calculating the number of virtual users you need
It’s important to determine how many concurrent users (or threads) are require to achieve the scaled business throughput.
The number or required concurrent users to generate a specific transaction rate can be calculated using a mathematical relationship from queuing theory called Little’s Law. The relationship can be defined in Salesforce terms as follows:

Total number of users = Total time spent per transaction * transactions per second.

When designing scale tests, the total time for a transaction is equal to the sum of the aggregate response time, think time, and wait time. Therefore, the total number of users required is given by:

Total number of users = (aggregate response time + think time + wait time) * transactions per second

Image depicting the steps for scale testing a sample e-commerce application.

Consider, for example, the e-commerce application shown here. Although each step can have a different response time, for the sake of simplicity you can estimate the response time based on the SLA limit of two seconds or less. So, for six steps total response time will be 12 seconds. Similarly, for each step assume the user takes five seconds to think, resulting in a total think time of 25 seconds. The wait time between iterations is 10 seconds. Therefore, the total time spent by single user in completing the end-to-end business transaction is 12+25+10=47 seconds. So, if the business requirement is to have 100 transactions per second and a single user takes 47 seconds to complete one transaction, then the total number of users required is 100*(12+25+10) = 4,700.

Smoke testing
Smoke testing checks the application under test and the test cases to ensure readiness for scale testing.
Smoke tests can typically be performed with 50–100 users, which is usually sufficient to verify the load script, the business scenario throughput, and the total number of users required to complete the full scalability test.

Scalability testing
The main objective of multi-user scalability tests is to assess the scalability of the application and identify concurrency bottlenecks. As you begin testing, start with a small number of users, enough to generate 1–10% of the required throughput target. From there, gradually increase the throughput (the number of users) in regular increments across a series of tests until you either achieve the target throughput or you encounter errors or constraint limits. To determine how big these increments should be, consider a spectrum with the baseline throughput at one extreme and the target throughput on the other; divide this spectrum into three to five intervals and conduct your tests at each interval to get a good understanding of scale behavior and trends.

Testing in stages
Use a spreadsheet like the one below to track the results from your test iterations. Each row represents one iteration or one stage of the complete test sequence. Before proceeding to the next stage, the results for all metrics should be within the limits set in your test plan. For example, consider a scenario in which you execute a test with the baseline throughput of 200 cases/hour and your test plan’s benchmark is to have zero transactions with response times greater than the SLA, zero failed transactions, and zero errors. If you achieve these benchmarks in the first stage, proceed to the next with increased throughput. If not, identify the root cause of the issue or errors. Reach out to Salesforce support teams for assistance in identification of errors if needed. Then resolve the issues before moving to the next stage. Repeat this process until the maximum throughput target is achieved.

Monitoring during scale testing
As you test, keep a close eye on metrics available through your tools, including the number of passed and failed transactions, response times, error rates, and so on. Any deterioration or anomalies in these metrics should be investigated and resolved before testing continues. On the Salesforce side, support teams can provide some insights into metrics that are not visible from the customer’s perspective.

Throttling
Because Salesforce is a multi-tenant platform, your scale testing may be throttled to safeguard platform stability. If your application starts encountering “503 Server Unavailable” errors during scale tests, throttling may have been triggered due to a breach of Salesforce limits.

One of the goals of scale testing is to proactively identify at what level of throughput throttling will take place. At that point, you’ll need to perform root cause analysis and take steps to optimize the application through Apex tuning, SOQL tuning, and similar efforts. Once the necessary optimizations are complete, either at the platform end or the application end, the chances of triggering throttling are minimized.

There may be cases in which despite your best efforts to tune the application, throttling may still occur due to the capacity limits of sandbox environments. In these cases, your tuning efforts may be enough to ensure scalability in the production environment. However, you will likely want to discuss your sandbox scale test results with Salesforce support teams, who can explore any further modifications or remediation required in the production environment to achieve the target business throughput.

Test reliability and predictability and general best practices
For scale testing to be successful on an ongoing basis, follow these practices:

  • During the course of testing change only one variable (for example, the code, use case, or data volume) at a time. Try to be as methodical as possible. While it may seem time consuming to change a single variable at a time in a large software environment, this approach will save time in the long run. If you change multiple variables all at once and performance worsens, then you have to backtrack through all the changes to find the problem. Make sure every change you make is clearly documented and includes the reason for the change.
  • Execute baseline tests at around the same timeframe as the current test. For example, don’t use baseline data that is six months old. On multi-tenant platforms, hardware and configuration can change, affecting scalability and performance metrics. To minimize this effect always rerun your baseline tests right before the current test.
  • Keep track of the total Data Manipulation Language (DML) operations for the use case under test with each major application release, as the total DMLs will affect scalability characteristics since the rate of DML operations is proportional to the amount of concurrency. You can calculate DML operations by running a single user flow in debug mode and inspecting debug logs manually, by using this online Apex Timeline app, or by asking the Salesforce support team and providing information needed to identify the test.
  • After the user load ramp up is completed, maintain the load in steady state for 30 to 60 minutes. Report metrics from only this steady state part of the test.
  • Repeat all the tests three times to ensure consistency of results.

Comparing test and production footprints
In order to produce valid scale test results, the characteristics of the workload you use for testing should match those of your production workload. To evaluate how well the workloads match, you can examine each workload’s footprint, by evaluating metrics such as DML operations/hour, XHRs/hour, or API calls/hour, for example. To make this comparison, you should normalize each measurement based on the number of similar measurements to the key entity in the data model (for example, Opportunity or Account).

For example, consider an organization in which Account is the key entity in the data model. You’ve measured the number of DML operations for Account in a given timeframe at 200 operations; you similarly measured the number of operations for object A as 100 and for object B as 400. The normalized value for each object is the measured value for the key entity divided by the measured value for the object. So, for object A the normalized value is 200/100 = 2, and for object B the normalized value is 200/400 = 0.5.

Calculate normalized values across production and test workloads for DML operations, XHRs, API calls, SOQL queries, and other relevant metrics. Then compute the difference or deviation for each metric as shown in the table below. The goal is to have the minimum possible deviation between test and production, which will ensure the most realistic scale tests.
This analysis should be performed any time your implementation changes, either due to a change in your Salesforce organization or a change in the Salesforce release.

Test data removal
In some instances, tests generate a high volume of transactional data that needs to be purged after every execution. Ensure you delete only the data generated as part of testing by following these steps.

  • First, identify the test data set including relevant child object data
  • Prepare a backup so that you can restore any data that you inadvertently delete
  • Use Data Loader or a similar tool to perform the hard delete. If you use Data Loader you can create a mapping file with the IDs of the rows to be deleted.
  • Ensure you allow sufficient time between two test iterations to allow Salesforce internal processes (that are run after hard deletes) to finish.

4. Scale workload analysis

Workload analysis can be carried out offline (once the tests have completed) to identify scalability issues and concurrency bottlenecks.

You may want to use Eagle Eyes, which is based on event monitoring, to gain insights into performance metrics. You can use a table like the one here to track different metrics for test assessment, including DB time, application CPU time, and concurrency errors.

5. Reporting and deliverables

The reporting of scale test results will depend on stakeholder requirements. Executive stakeholders may be interested only in a high-level summary, whereas developers and analysts may want more details on throughput, response times, and other KPIs. Reports should include to what degree the scalability goal was met, and any deviations from that target.

To jumpstart report creation, you can use the Sample Scale Engineering Report Template.

It’s a good idea to include a chart showing throughput vs users (or load) with Experience Page Time (EPT) also under SLA. In an ideal scenario the chart should be linear, indicating an absence of governor limit and other scalability errors.

A chart showing a linear relationship between target throughput and target user load.

Aside from reporting, one further vital step is to ensure that any enhancements to application code and any optimizations (for example, the creation of skinny tables or indexes) that were done in the sandbox are replicated to production and clearly documented.

Conclusion

This post covered the full end-to-end scale testing lifecycle for Salesforce, from planning, to workload creation, execution, and analysis, and finally reporting. By following the framework and guidelines outlined in this post, you can identify — and then remediate — performance bottlenecks. The result, Salesforce applications that perform at scale, will be able to reliably deliver excellent customer experiences even on peak usage days.

For more background on Salesforce performance testing, see Sam Holloway’s Introduction to Performance Testing on the Developers’ Blog.

About the Authors

Anand Vardhan is a Lead Engineer for the Frontier Scale team at Salesforce, working on designing and strategizing scale testing for large and complex customer implementations and scaling customers to achieve business needs. Anand specializes in application design, Lightning, API design, data architecture, large data volumes, and caching. Anand can be reached on LinkedIn.

Shainesh Baheti is a Lead Engineer for the Frontier Scale team at Salesforce, working on designing and architecting workload simulations and scale engineering and helping large customers to meet complex business goals. Shainesh has deep expertise in performance and scale engineering, server-side optimizations, handling large data volumes and workload simulations, and automating performance engineering processes. Shainesh can be reached on LinkedIn.

--

--

Salesforce Architects
Salesforce Architects

We exist to empower, inspire and connect the best folks around: Salesforce Architects.