Optimizing Lightning Performance on Citrix Virtual Apps and Desktops
Citrix Virtual Apps and Desktops enable users on any device to share client-side computing resources hosted in a data center. Many Salesforce customers access the application from Citrix environments. When these customers transition from Classic to Lightning, some users experience slow page loads due to various factors that can lead to low Octane scores.
Earlier this year, Jim Tsiamis published a helpful post on the Salesforce Developers Blog titled Improving Lightning Performance on Virtual Desktops. Key takeaways from that post include:
- CPU on the client machine is the single most important lever for improving Lightning performance.
- Analysis has shown that newer CPU generations with fewer cores improve performance.
- Performance tests must be run with the same user density and usage patterns expected in production.
Salesforce and Citrix recently ran joint performance lab experiments on both Amazon Web Services (AWS) and on-premises (bare metal) Citrix deployments to corroborate and verify these key takeaways. In this post, we share our methodology and results, which you can use to size your own virtual environment for Lightning Experience and other enterprise SaaS single-page applications.
Goals of the lab
Our joint goals in establishing the Lightning Performance lab are threefold:
- Draw attention to and substantiate the importance of CPU
- Help customers upgrade their virtual environments without forfeiting scalability
- Model an effective client-side performance testing approach
Draw attention to and substantiate the importance of CPU
Many Salesforce customers are unaware of the effect an under-powered CPU can have on Lightning performance. The rich, responsive UI of Lightning Experience depends on the single-threaded JavaScript runtime. Since there’s only one thread, the runtime maintains a queue of work. The thread is owned by work being processed until its synchronous logic is completed. When a processor struggles to complete work, the thread is blocked and page rendering slows.
Octane is a JavaScript benchmark owned by Google. It measures the CPU’s ability to execute various programming constructs provided by JavaScript. The Octane test stresses the CPU only. It does not perform better or worse based on other factors such as memory, hard disk, network connectivity, or bandwidth.
Therefore, the lab was set up to illustrate:
- The relationship between CPU clock frequency and Octane score
- The relationship between CPU core count and Octane score
- The relationship between user density and Octane score
- The relationship between Octane score and Experienced Page Time (EPT)
Nick Rintalan, Citrix Consulting Lead Architect, is an expert on virtual machine (VM) scalability. He detailed the importance of CPU in his post Citrix Scalability — The Rule of 5 and 10. His team can easily estimate the maximum user density for Citrix Virtual Apps and Desktops based on the number of physical cores on the server. As Nick puts it, “…our (Citrix) specialized VDI and RDSH-based workloads are CPU-bound 99.9% of the time! That’s right — CPU is the scalability bottleneck or limiting factor these days as opposed to memory, disk/storage, or network…it really boils down to CPU.”
So “The Rule of 5 and 10” roughly applies to any app or workload, but we want to validate and be precise for Lightning Experience usage. We are guided by the words of American data scientist W. Edwards Deming, who inspired Japanese industry after WWII: “Without data, you are just another person with an opinion.”
Help customers upgrade their virtual environments without forfeiting scalability
When IT organizations must upgrade virtual environments to provide Lightning users with the required performance, it can mean higher cost per user. Virtual machines in the market generally fall into these categories, which offer trade-offs in terms of scalability, manageability and security:
- Single-session OS: used to deliver VDI (Virtual Desktop Infrastructure) desktops for one user at a time
- Multi-session OS: used to deliver published (server-hosted) applications or desktops for multiple concurrent users
While writing for TechTarget in 2014, virtualization industry analyst Brian Madden noted that published desktops historically supported higher user density (meaning better scalability and lower hardware cost) than VDI but were more difficult to manage. By the time VDI user density improved with hypervisor advancements, easy-to-manage published apps had become a viable alternative in today’s IT environment.
With that in mind, these two environments comprise our Lightning performance lab currently:
- Citrix Virtual Apps published desktops deployed on AWS EC2 — Salesforce is a Citrix Virtual Apps customer, with over 5,000 users accessing Lightning Experience via the virtual platform globally. Automated testing from various machine types is helping to optimize this environment.
- Citrix Virtual Apps published apps deployed on a bare-metal on-premises server — Citrix designed a proof of concept to maximize user density while sustaining a 30,000+ Octane score. In this case, the published app is the Chrome browser.
When sizing a virtual environment for Lightning, IT organizations should first set a target minimum Octane score. While the published requirement for Lightning console apps is 30,000, a higher Octane score provides better performance. Some use cases may demand 35,000–40,000 while others can get away with 20,000–25,000. It’s important to find the maximum user density where the target Octane score can be sustained across all sessions.
Model an effective client-side performance testing approach
Samuel Holloway, Success Architect Director at Salesforce, recently blogged about performance testing on the Lightning Platform. There are a few topics from that noteworthy series to highlight and expound upon in the context of shared virtual client environments:
- Measuring the Salesforce user experience — Whether your tests measure EPT, or a similar metric via Real User Monitoring, the Octane score of the client rendering the page is a critical factor.
- Think time — Usage patterns, of both Salesforce and other applications accessed from the client machine, have an impact on maximum user density and must be considered when designing tests. More intensive usage patterns, with less think time or many records accessed simultaneously, reduce the availability of cores on the machine.
- Establishing baselines — In customer organizations, baselines should be established following Salesforce releases and customer releases. Baseline tests should be run with only one user session on the client machine.
- Collecting user interface metrics can be tricky — In the context of single-user testing on virtual desktops, load generators are concerned with emulating production client-side load rather than generating server-side load. If CPU is underutilized in the virtual environment during tests relative to production, teams may conclude that overall performance is better than it actually will be. In production, Octane scores may be lower, resulting in higher EPT.
Salesforce lab on AWS
Lab design and methodology
Performance lab processes must be trusted, secure, automated, and repeatable. To satisfy these requirements, we automate Google Chrome (via the W3C WebDriver protocol) in a Citrix multi-session OS environment on AWS. Note that we do not yet have the ability to automate published app sessions in the AWS lab.
The lab consists of two Windows Server EC2 instances:
- The Driver machine, where WebDriver automates the workload and collects the measurements
- The Experiment machine, where n pre-existing Citrix virtual desktop sessions each host ChromeDriver, a standalone server that listens to commands and drives the Chrome browser
The WebDriver script on the Driver machine contacts the n ChromeDrivers on the Experiment machine and instructs them to execute the workload. The goal of the automation is to collect measurements (e.g. Octane, EPT) for n concurrent users accessing Salesforce from a single client machine. We can readily change out the EC2 machine type of the Experiment machine in order to identify the best fit for the workload.
Initially, the lab tested against three EC2 instance types that cost less than $0.30 per hour.
The automation performs the following workload. No other user or process accesses the virtual environment for the duration of the testing, and the desktops are limited to running Chrome and accessing Salesforce. Note that in production, users on full virtual desktops will not be so constrained in their activity, and this can impact performance.
- Open the Chrome Browser
- Log in to the Salesforce instance with a prepopulated data set (we don’t want empty list views or sparsely populated records)
- Switch to a console app
- Open the list view of Opportunities (for Sales Cloud) or Cases (for Service Cloud)
- Open 11 separate records in 11 console tabs. After each record:
- Collect the EPT
- Go back to the list view and click the next record
Workload automation
To automate the workload and produce the data analysis (like the results in the table above), we used Node, a simple JavaScript file, and the WebdriverIO library. In this section, we provide code snippets to illustrate the pattern. Tools such as Selenium, JMeter, and LoadRunner can also be used for this type of automation.
In this code snippet, we create the remote object to connect to five sessions (simulating five concurrent users) on the Experiment machine. The Multiremote API allows you to run multiple WebDriver sessions in a single test.
For each session, start ChromeDriver with its own port, monotonically increasing. Here we are assuming that User1 will have ChromeDriver running on port 4444, User2 on port 4445, and so on.
We now have a remote object that can be used to start n browsers simultaneously, log in, and more.
The following few lines of code perform the Salesforce log in. The remote object enables the browsers to connect to the supplied URL, wait for the login page to load, fill the username and password, and then click the Login button.
To collect EPT, execute the following code, which injects JavaScript into the browser, attempts to get the EPT, and then returns it along with any error and current URL.
Lab results
While capturing EPT, we also calculate Octane scores as each session is added to the system. An additional sampling session is used to run the consistent-octane command line tool (to remove variance) while the other sessions carry out the workload detailed earlier. Based on these test results, we can expect to see the Octane score drop below 25,000 with six concurrent sessions on a c5.xlarge machine.
Octane is a key metric, but customers ultimately need their pages to load within a target EPT. These targets can vary with page complexity or usage pattern. For example, a periodic planning process may not have the same EPT requirement as daily task management.
User density versus EPT tests on the three different EC2 instance types show that:
- The lower initial Octane (18K) of m5.large correlates to 25% and 67% longer EPT at the baseline (one user) relative to m5.xlarge and c5.xlarge respectively. Note that m5.large and c5.xlarge each have 8 GB memory.
- On m5.large, EPT skyrockets when the ratio of concurrent users to vCPU exceeds 2:1.
- Compute-optimized c5.xlarge provides an EPT reduction of 10–25% relative to m5.xlarge.
When we plot the median, 75th percentile, and 90th percentile EPT for a given number of concurrent users on c5.xlarge, we see:
- Median EPT at five concurrent users is just 33% higher than the baseline.
- At seven concurrent users, however, median EPT is double the baseline and 75th percentile EPT is double the median.
- Performance degrades significantly when the ratio of concurrent users to vCPU exceeds 1.5 to 1.
With a suitable instance type identified and maximum user density calculated, incremental page design optimizations can be tested to further improve EPT for the workload.
Note that the lab’s page design remained unchanged across these initial tests. In the future, we expect to prove that performance optimizations and regressions have a nonlinear impact on EPT along the Octane scale. For example, consider an optimization where a relatively heavy component is moved from the default tabset tab to a non-default tab on a record page. If this reduces median EPT by 0.5s for that page on a 30,000 Octane client, it could reduce median EPT by 2s on a 15,000 Octane client.
Lab takeaways
In summary, these are the steps to establish an automated Lightning performance lab for virtual desktops:
- Select two or three compute-optimized instance sizes
- Select a test automation tool to run on the Driver machine
- Install Citrix Virtual Apps and Desktops on the Experiment machine and create twice the number of desktop sessions as the number of vCPU on the instance
- Acquire a sandbox with the Lightning app deployed and adequate data sets loaded
- Work with the business to define a workload of the most common user actions and agree on target EPT(s)
- Script the business workload to run on the Experiment machine
- Find the maximum user density based on the target EPT(s)
- Repeat steps 4–7 for the other instance sizes in order to maximize user density (and minimize cost)
- Tune your operating system and virtual layer configuration for performance
- Further optimize your Lightning pages for performance (check out Lightning Speed with Chris Marzilli for more on this)
While supporting such a lab environment and automation scripts requires investment, the benefits of a repeatable, data-driven process are significant. Without consuming business users’ time in testing or experimenting with their productivity in production, the lab enables customers to:
- Tune the virtual environment configuration
- Guard against performance regressions in each release
- Optimize Lightning page design for various usage patterns
Citrix on-premises lab
Lab design and methodology
Three user cohorts performed manual testing of various Case Management workflows from different bare-metal environment setups. The workflows included: global search, accessing a case, hovering over quick links, entering case comments, editing a case, saving changes, sending email, and opening a report.
Cohort 1 (15 users) accessed Salesforce concurrently from the legacy VDI hardware:
- Intel(R) Xeon(R) Gold 5115 CPU
- 2.40 GHz Dual 10 Core
Cohort 2 (15 users) accessed Salesforce concurrently from a similar VDI host but on newer hardware with a much faster CPU and lower core count:
- Intel(R) Xeon(R) Platinum 8256 CPU
- 3.791 GHz Dual 4 Core
Cohort 3 accessed Salesforce from Citrix Virtual Apps (1912 LTSR CU1) published Chrome sessions on the same newer hardware as Cohort 2. This round of tests started with 20 concurrent users across three Windows Server 2016 virtual machines, each with 8 vCPU and 32 GB of RAM. Additional users were added until the Octane score dropped below 30K at around 35 concurrent sessions.
Lab results
It’s important to remember that these results are based not only on the hardware and virtualization but also the workload and usage pattern of the test. Dedicating the hardware to published browser sessions limits competing activity on the machine. In order to further control Octane and better predict performance, restrict published browser access to Salesforce domains.
On the newer hardware, user density for Cohort 3 on published Chrome was at least double that of Cohort 2 on VDI. Cohort 3 attained the following session-to-CPU ratios while supporting 30K+ Octane:
- 1.5 to 1 session-to-virtual CPU ratio
- 4.5 to 1 session-to-physical CPU ratio (whereas “The Rule of 5 and 10” expects 10 to 1 for published app sessions)
- 2 to 1 session-to-hyper-threaded physical CPU ratio
CPUs with higher core counts generally have lower CPU clock frequencies per core. These results support the assertion that processors with fewer cores provide better performance per core.
Conclusion
The performance labs established by Salesforce and Citrix demonstrate how to optimize Lightning performance on Citrix Virtual Apps and Desktops. The results prove the criticality of CPU clock frequency and user density when accessing Lightning Experience from virtual environments. However, we’ve shown that upgrading for Lightning does not mean forfeiting the scalability and commercial advantage of virtualization. Customers can use these data points, and better yet, establish performance labs of their own to deliver an optimal Lightning experience for their users.
For more information on the topics covered in this post, see the following reference resources.
If you have any questions, please post them to the Customer Architect Community, @ mention one of the authors, and include the topic “Solving for Lightning Performance on Virtual Desktops.”
References
- Improving Lightning Performance on Virtual Desktops — Salesforce Architects
- Lightning Console Technical Requirements — Salesforce Help & Training
- Learn JavaScript Core Concepts — Trailhead
- Introduction to Performance Testing & Performance Testing on the Lightning Platform — Salesforce Architects
- Citrix Scalability — The Rule of 5 and 10 — Nick Rintalan, Citrix Consulting Lead Architect
- Citrix Virtual App user density on AWS — Brian Martynowicz, Login VSI Director of Customer Services
- VDI has won the war against RDSH. Here are 4 reasons why — Brian Madden
- VDI vs RDSH doesn’t matter if users don’t need full Windows desktops — Brian Madden
Contributors
This post represents a much larger team effort across both Salesforce and Citrix. The following people contributed their time and expertise to this effort:
- Shaughn Harrod, Principal System Architect, Citrix
- Kevin Kovalcik, Director, Services Systems, Citrix
- Arun Costa, Senior Director, Performance Engineering, Salesforce
- Ram Choletti, Lead Infrastructure Engineer, Salesforce
- Pallav Saikia, Senior Manager, Infrastructure Engineering, Salesforce
- Pak Fong, Success Architect Director, Salesforce
- Christopher Marzilli, Director Platform Success, Salesforce
About the Authors
Zachary Kampel
Zachary has been part of the Customer Success Group at Salesforce since 2011 and a Certified Technical Architect since 2013. Zachary has helped many enterprise customers adopt Salesforce platform services successfully, and he currently leads a cross functional initiative focused on optimizing Lightning performance.
Steven Bougon
Steven spent the last 6 years in the Sales Cloud engineering group, focusing on front-end performance, optimizing the Lightning ecosystem as well as creating methodologies to prevent regressions and improve the speed of each page.
Steven is also part of the W3C Performance Working Group, to help for a faster and more measurable web.