An Approach to Application Performance Optimization

Harish Katti
Walmart Global Tech Blog
10 min readOct 28, 2020

In today’s world of microservices architecture, ever increasing number of services and the tight Service SLAs, application performance is of at-most importance to ensure an E2E seamless experience for customers.

This article provides insights into evaluation of performance of an application and devise solutions on how to improve the performance of an application using specific Azure Services and other code optimizations.

Understanding the components of your Application

Most of the applications that are built use one or the other Dependency Injection framework available today. An application has multiple components embedded into it. Optimizing each one of them is of at-most importance to have a highly performant application.

An Application can include —

  • CRUD Operations for RDBMS and No-SQL DB’s
  • API’s for processing business logic using different kinds of design patterns.
  • Caching.
  • Async processing.
  • Batch processing.
  • Scheduled data processing.

And many more..

Performance Optimization Approaches

Performance Optimization is an iterative process which involves running multiple load tests on application to understand the behaviour of every component. We will look into the most commonly used components and how to look for optimizations of the same.

SQL DB Optimization
Most of the cloud providers have performance monitoring tools to measure different metrics related to component performance. Microsoft Azure provides intelligent performance tool that provides insights into query performance and helps in optimization of queries by providing different recommendations. Few of them are -

  • Add or remove indexes as necessary.
  • Parametrized queries for caching execution plans that reduces resource usage.

Memory Utilization Review
Before an application is production ready, analysis of memory usage is necessary to ensure no memory leaks are hindering the performance. While the application is running on a high load, take a heap snapshot using the command —

jmap -dump:format=b,file=<filename.hprof> <pid>pid - process id of the java process running on the server.

We can also connect to the remote applications and understand the memory usage of the running application using tools such as VisualVM. This would require some firewall changes on server side to ensure you can connect to the application on remote server.

  • Heap Utilization Review
    JVM heap utilization can be viewed using VisualGC plugin on VisualVM. This provides insights into how every GC effects the Eden and Survivor spaces before moving the objects to old gen. An upward slope and downward slope on the graph indicates how the memory utilization is increasing and decreasing with time. Having both the slopes indicates that the memory utilization is healthy.
Heap Utilization
Heap Graphs

Complete Eden space utilization above indicates that the object allocation is better and heap is being utilized to maximum extent.

  • Memory Leaks
    Memory leaks is one of the major contenders for degrading the performance of the application which can ultimately lead to crashing of applications.
    VisualVM provides memory and CPU profilers and samplers. Generation counts of objects in memory profilers provides data on the objects that have survived the most and can be an indicative of memory leaks. We can be certain, it needs more analysis as to why the objects are surviving the garbage collection cycles, but any objects surviving for a longer duration of time which are not specifically required to survive GC’s are most likely to cause memory leaks.
  • Code Inspect with IntelliJ
    IntelliJ provides a beautiful way to analyze a code and provide issues related to memory and performance. Most of the times, we fail to recognize that Collections have default memory allocated and an excessive usage of collections with lesser data in the collections can lead to higher memory utilization. With the Inspect Code utility of IntelliJ, it provides us recommendations to improve the performance.
    We almost always fail to initialize the size of collections. This can lead to higher GC’s due to failure of allocations in memory.
    Another issue that needs to be addressed is the usage of anonymous class. Anonymous class posts a danger of having dangling references that can never be garbage collected and leads to memory leaks.
    All such performance and memory issues recommendations should be looked at before every Code commit.
  • GC Logs
    Garbage Collection logs help in identifying the JVM heap utilization, issues related to memory and can potentially indicate the memory leaks.
    While building applications, we fail to recognize the Young Gen and Old Gen memory allocations for JVM. This is always defaulted by JVM.
    But JVM provides a way to tweak the ratios of the Young and Old gen Memory allocation using -XX:NewSize and-XX:MaxNewSize flags. These flags can help in better utilization of Young Gen and Old Gen spaces when we are aware that the application needs larger young gen space compared to old gen space due to inherent application complexity in data processing.
    GCLogs can be analyzed using many open source websites like: https://www.gceasy.io/

GC Logs can be obtained by appending the below command to catalina opts.

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -Xloggc:<folderName>\<filename>.log
  • Thread Soft References
    Object Mappers and Async Thread processing use Soft References for faster retrieval of Objects during request processing. Soft Reference Objects can stay in memory for longer duration surviving major GC’s. They are GC’ed only when the memory requisition reaches a threshold. This can lead to application crashes when you have processing with large json documents with Object Mappers or significant async processing. There are two ways to mitigate this issue —
    — Disable the Feature USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING in JsonFactory of Object Mapper.
    — Tweak the value for JVM Flag: -XX:SoftRefLRUPolicyMSPerMB.
  • Caching
    Caching is one of the widely used concept in applications which can help in storing static data required for faster retrievals. While we have multiple caching libraries available out there or we can simply use static objects, it is important to note the trade offs between them and use the most optimal way to cache data in applications.
    One of the major drawbacks in using static cache that is never refreshed and is ever growing is it can lead to memory leaks. Caches should always have a refresh/eviction criteria's set so that we never hold onto the references of those objects that can out live the context of processing times of an application.

CPU Utilization Review

In any application, every line of compiled code run utilizes CPU. Hence, CPU resource should be optimally utilized. Applications use CPU processing for creation of resources like Threads/Executors, GC’s, Computations etc.

A few things to consider while building an application or analyzing CPU performance are below:

  • Async Processing
    While we create or use algorithms for solving any use cases, we should be aware of when to apply async processing and async parallel processing logic in an application.
    Async processing should ideally be kept out of transactional context as it can create data inconsistencies when used for transactional processing. Any IO intensive operations like calling dependent services or fetching data from SQL which can be outside of transactional context can be done via async processing.
    We should always create resources such as Executors during application start up and use the same for async processing. Creation of Executors is a CPU intensive job and creating it for every request is frowned upon as it severely degrades the CPU performance.
    One way to create an Executor in Spring boot application is as below:
@Bean(ServiceConstants.TEST_EXECUTOR)
public ExecutorService getTestExecutorService() {
return MDCRetainingExecutor.wrap(
Executors.newFixedThreadPool(
ConfigService.getIntegerValue(ConfigConstants.TEST_EXECUTOR_POOL_SIZE)));
}

We use a flavour of Executor Service that can retain the MDC so that we can pass tracking id’s and MDC context from main thread to executor threads.
After using a thread in async MDC should always be cleared so that the object becomes eligible for GC.

  • Batch/Bulk Processing
    Batch Processing is used in multiple contexts, be it request processing in batches, data insertion to DB’s (SQL and NoSQL) etc. One of the experimentations that led to fruitful result for us was to use Bulk Processing during request processing when we get Collection of data in request. We process all the data in bulk using collections and persist them to DB’s in batches/bulk.
    If you are using JDBC Prepared statements for query executions in DAO layer, it is best to use bulk processing for persisting the data as it can save a lot of processing times when done in sequence. Spring JPA also provides interfaces for persisting data in bulk for SQL DB’s.
    Microsoft Azure Cosmos provides bulk executor library that provides a performant way to import documents to Cosmos DB across multiple partitions with multi master set-up.
  • Processing time analysis of each API Steps
    While we have an algorithm in place for every use case, it is always desired to analyze the time take by every request in each step of processing. This provides insights into time taken by every dependent service call, data source calls and In memory computations time. This data will help in analyzing code in minute details and churn the code to reach the required SLA’s. This activity also helps in devising new strategies for processing requests like making independent service calls in async, much higher in order than required that can help in better CPU utilization and faster processing times.

Other Performance measures

  • Diagnosis Tools
    Microsoft Azure provides diagnosis tools to measure Memory and CPU performance of a WebApp. This provides data on how the app is performing over longer period of times which can be helpful during soak test runs.
  • Cost Optimization measures
    Cost is an inherent aspect of building an application. Every component used adds to cost. All the performance improvement measures directly aid to better utilization of costs.
    Costing of platform components should be done to understand the provisioning of resources to match the expected TPS and SLA’s in a production environment.
  • Monitoring Tools
    Every cloud service provides multiple monitoring tools for every PaaS components that can help in understanding the performance metrics such as p95, p90, p99 of requests in a given time frame, Garbage collection metrics, heap metrics etc.
    We can also integrate with tools like Prometheus, Grafana and distributed tracing tools such as OpenZipkin etc.
  • Load Testing & soak Testing
    Load/Stress testing of applications is required to analyze the performance under high stress. This helps in identifying the gaps and areas of improvements as explained above so that the application can be optimized.
    Soak testing allows us to run performance/stress tests on applications for longer duration of times. This is another way to unearth any unknowns that can lead to performance degradation in long runs.
    Both load and soak tests are a mandate before an application is production ready.
    Tests can be performed using the tools such as Jmeter that allows us to write test scripts for an application to meet the expected TPS.
  • Distributed Performance Testing
    An important measure of performance is the network latency between clients and servers. In the era of World Wide Web, it is important for us to deploy our services such that the geographical boundaries shouldn’t define the experience for end users. Hence it is advised to always run distributed performance tests so that we know the clientele distribution and the optimize the network hops for requests if required, have a distributed deployment strategies so that the network latencies are minimal and doesn’t hamper the customer experience world wide.
  • SLA’s/Response Time Requirements
    In Microservices world, response time of each service define the behaviour of an end to end system and ultimately the customer experience. Hence, it is vital to define SLA’s/Response time requirements for every service interface.
    In our systems we measure the SLA’s by p95 response times for each interface should be sub 500ms over the stress testing and soak testing periods. This ensures that the orchestration services for the user interface expect all the responses with-in 500ms so that they can provide the required data with in sub second.
  • Throughput calculations
    Throughput or TPS(TransactionsPerSecond)/RPS(RequestsPerSecond) is a measure of number of requests a given application server can handle within 1 second. Throughput can be calculated or achieved in 2 ways.
    First, if we are already aware of the total number of transactions a system needs to handle in a given time frame, we can backtrack to get the TPS for every interface.
    Second, if we are not aware of the total number of transactions a system needs to handle in a given time frame, we can assume the data based on realistic needs of the application and then backtrack to get the TPS for every interface.
    In both the scenario’s, a peak traffic assumption needs to be made and the number of requests should be altered based on the assumption and then arrive at the worst case scenario TPS requirements.
    Performance testing should always be done considering the worst case scenario of requests and then amortize it to match an average TPS requirements.
  • Resource Provisioning
    Resource provisioning for PaaS components goes hand in hand with performance testing. As the TPS requirements gradually increase the resource provisioning increases. Any application will hit a limit after certain optimization and then should be horizontally scaled to match the SLA’s and TPS requirements. CPU and Memory are the major defining factors for horizontal scaling. Ideally an instance should never exceed 70% Memory and 60% CPU utilization. The number is debatable, but this ensures that any spike in CPU and Memory usage due to spike in requests can be handled until auto scaling of resources takes place and the extra load until the resources are up, are handled without compromising customer experience.

Conclusion

In conclusion, there are plethora of steps that can be taken to optimize an application performance. The above performance review only provides ways to optimize code and resource consumptions and mitigate risks of performance impact. As an application developer we should always keep in mind the trade offs that are acceptable and the ones that are not, and take actions accordingly.

--

--

Harish Katti
Walmart Global Tech Blog

Senior Software Engineer At Walmart Global Tech India, Health & Wellness Group