A four-step practical approach for making your web app load faster

Published in

AT&T Israel Tech Blog

13 min readMay 9, 2022

(Or, how we managed to speed up our web app load)

“Get out of here! You can’t be serious with a home page that takes 15 seconds to fully load. I’ll switch to one of your competitors!”

This sentence is taken from a made-up conversation that took place only in my mind, involving an imaginary customer and our application Product Manager. This imaginary sentence led to the creation of a very real JIRA ticket “Performance optimization for the site”.

Per — For — What?!?!

Truth is, our customers are polite and calm. Both our customers and the dev teams work for the same organization. There aren’t any open-market competitors available for our customers to switch to, and the application being developed is for corporate employees’ internal use only. There aren’t any SLAs or time-sensitive flows defined in it. All in all, you could pretty much say that our customers are captive customers.

Nevertheless, as far from real-time the application is, it’s still composed of layered microservices that talk to one another, invoking multiple APIs per user flow, querying various databases, using message bus, and harnessing all kinds of communication protocols. This architecture makes it regrettably too easy for us developers to make it perform ridiculously slow, even for a “best-effort” SLA.

By any standard, loading a home page for 15 seconds is very unreasonable. It likely can cause user frustration, less productivity, page abandonment, and negative product sentiments, all of which are bad for the organization. And that’s just the tip of the iceberg. That’s why we developers can never let the concept of performance leave our mindset.

Step 1: Siri, what’s good performance?

Now that we have understood and acknowledged why reasonable performance is important, the first step to achieve it is to define our success criteria. This means that we need to define what is considered adequate performance for our application. Here are a few examples of such criteria:

User satisfaction or customer experience survey score
Amount of some critical business flows completed per time unit
(AVG/Median/95th percentile) page-load time
(AVG/Median/95th percentile) API response time

As mentioned, these are just examples, and there are many more criteria to be considered.

Based on customer feedback, we chose to focus on two of these criteria: User satisfaction and optimizing the page load time. How can we tell if the users are satisfied with application performance? Simple — Just ask them. So we did. And they complained. That’s how we got to writing this blog :-)

How do we know if the page loads fast enough? SEO best practices define optimal load time to be up to 2–3 seconds. Our home page loads in 15 seconds. That’s far from good. We need to know why.

Step 2: O Bottleneck, where art thou?

After we defined our success criteria and have a clear understanding of our standing point regarding them, we need to identify the bottlenecks that keep us from achieving them. So first, let’s look at a simplified application architecture diagram:

In this diagram, we can see that the application has four major players responsible for most of the application functions:

An Angular single-page application(SPA) rendered in the user’s browser
A web server that hosts our SPA files, enforcing user authentication and acting as a Backend-For-Frontend(BFF) forwarding requests to our Backend APIs.
A Backend server crunching some logic through backend APIs
A database

Each of these could potentially cause the performance lag we’re experiencing. So how do we know who’s to blame? We need to collect performance metrics. We can do this in various ways.

Option 1: The fully integrated way — APMs

An APM — Application Performance Management — is a tool that integrates into your application code on your deployed environment and collects usable metrics from it. This integration is usually done by running an agent that monitors your runtime environment using hooks or language-dependent profiling protocols. APMs are generally multi-platform and cater to a variety of technology stacks, to make sure all your application components are monitored.

Sounds great, doesn’t it? It is. But, it has its costs. Licenses for these APMs aren’t free for corporate use and integrating with them has its learning curve and developer & SRE efforts. It might be considered overkill if you just want to have a stopwatch measuring your code execution time. So, what other options do we have?

Option 2: The semi-integrated way — tailor-made metrics (AKA “quick and dirty” )

As we had prior knowledge of our architecture and tech stack, we had the option of tailor-making the solution to measure the metrics we need. This was our favorite option because we wanted to achieve fast results and didn’t want to integrate an entire APM ecosystem just to speed up the application.

As we wanted to decrease the page load time, we needed to know where our code spends its time during the page load flow. For this purpose, we chose to measure and observe the following:

The browser request/response timing. Assuming that the application gets stuck when loading data, this gives us a clear understanding of the browser’s network activity and whether it is related to the data loading.
Request/response timing in our BFF. This exposes whether our BFF server adds latency or blocks the data transfer between the FE and the BE
Request/response timing in our BE. This helps us identify slow-running BE APIs. At this point, besides measuring time, we can also review the code and debug slow-running requests, as most of the logic is written and executed in the BE.
DB query timing. This pinpoints slow-running queries on our BE that cause the page to load slowly.

Measuring browser request/response timing

This is easily observed using the browser’s network activity log, one of the very useful developer tools. Specifically, we are interested in seeing which resources take time to load.

Here are our results:

Browser request timing and waterfall diagram

This diagram clearly shows three areas blocked by a request barrier (marked with a red arrow). In each of these areas, new requests aren’t transmitted as the code is still waiting for the response of a delayed request. Practically, this means that the application is stuck in a loader animation and the user can’t interact with it, resulting in a slow user experience. Just from looking at the network diagram, we can conclude the following takeaways:

Takeaway 1: The second & third barriers are caused due to slow-responding API calls. These APIs form the input for the following phases. They will be analyzed in the BE timing phase, and can hopefully be sped up.

Takeaway 2: The first request barrier seems to be justified and unrelated to the application logic as it occurs before the application is even loaded. The browser waits until all the static resource files are loaded, and this seems to take over one second. One second seems to us as too long to load static files and we’re certain this could be drastically reduced by harnessing the browser cache and response compression.

Takeaway 3: Assuming that from time to time, there are slow responding APIs, is the application really dependent on their responses? Is the loader experience a necessity in regards to the APIs responses? Or, could we create a better user journey where, while the request is on the air, the user can still interact with the application? These are questions we might need to answer eventually.

Takeaway 4: We have too many API calls! There are more than 25 XHRs called from the FE application. Most browsers limit the number of concurrent requests to the same domain. Chrome has a strict limit of 6. Firefox has it defaulted to 6 as well, although this can be changed with a little effort. This clearly limits the application's performance if it still awaits a response to return. Also, at the time of performing this optimization process, enabling support for HTTP/2 on our BFF side seemed a little risky as it wasn’t fully integrated into expressJS. Nowadays, it seems like a valid solution for this situation.

Measuring BFF request/response timing

In order to measure whether our BFF adds latency when acting as a forwarding proxy, we collected both HTTP server request timing and HTTP client request timing. This was done by using our preinstalled Prometheus server. We integrated the nodeJS Prometheus plugin called express-prometheus-middleware. Integrating Prometheus gave us insights into the level of latency added to each API call when using our BFF as a forwarding proxy between our FE requests and our BE responses. We discovered that each request gets delayed for at least an extra 300ms, peaking at a 1-second delay. This means that our BE returns the response, but it gets delayed for an extra, not negligible amount of time until it reaches our BFF and is then returned to the FE. Our takeaways from this analysis:

Takeaway 1: There is a network-related slowdown in the route between the BFF to the BE, involving some gateway. A support ticket was immediately created for the responsible network infrastructure team.

Takeaway 2: If the network delay isn’t to be solved soon, or if the architecture allows direct communication between the BFF to BE, as they’re both parts of the same Kubernetes cluster, we might want to consider using direct communication.

Measuring BE request/response timing

After analyzing our FE and BFF network behavior, we wanted to know which backend APIs are responsible for the slow performance. As our BE is written in JAVA over Spring Boot, this was hardly an effort — we just integrated micrometer metrics with our Prometheus server and we’re good to go. We let it run for a few hours, collect some statistics, and ran the following query on the Prometheus graph:

http_server_requests_seconds_sum/http_server_requests_seconds_count

This query splits all the APIs by their URLs and calculates the average response time for each request type. We received a graph similar to this:

Our takeaways from this analysis:

Takeaway 1: There are two slow APIs, that are called from the application home page. By slow I mean - they take more than 7 seconds to return!

Takeaway 2: Reviewing the slow APIs’ code makes us think that it’s less than optimal. They both retrieve too much data and run some queries that are mutually independent in a serial and not parallel way.

Measuring DB query timing

As you recall, our BE is built upon the Spring Boot framework, harnessing Spring data and Hibernate as the repository and JPA implementation layers. In such a case, collecting statistics is very easy using the following properties:

spring.jpa.properties.hibernate.generate_statistics=true
logging.level.org.hibernate.stat=DEBUG

Combining this with Hibernate’s slow query log will give you a pretty good slow query discovery mechanism.

However, we didn’t need to use any of them, as we had our own logging aspect that captures annotated classes and methods and prints out their names, invoked arguments, and execution time. We chose to use it as it makes it easy for us to find slow queries and their respective spring repository methods, so we used it to trace our repositories' runtime. Collecting these log lines from a live application loading the home page a few times, got us lines looking like:

2022–02–17 14:47:01,834 INFO com.att.ocp.provisioning.repository.customer.CustomerRepository [ForkJoinPool-2-worker-9] #findByProfileStatusAndOcpState([‘ADD’, ‘NEW’]): [Customer […]] in PT3.215S

after analyzing these log lines, we concluded the following takeaways:

Takeaway 1: There are queries that take more than a few seconds to run. For sure, we can optimize these queries.

Takeaway 2: there are four mutually independent queries(querying different tables) that run in a serial and synchronous manner throughout the slow APIs. This could probably be converted to parallel execution to save the wait for many of those queries.

Now that we collected and analyzed some data, we can proceed to the solution proposal step.

Step 3: Pick your battles

As we’re on a hunt for a solution that makes a great impact on our application performance, we only needed to implement solutions that are worthy of implementing. For this purpose, for each solution suggested, we evaluated its foreseen implementation complexity, and its expected speedup and tried to calculate a Return-On-Investment(ROI) index based on them both.

Here are the solutions we could think of, and their estimated ROIs on a scale of 1–5:

Client speedup:

Allow client-side(browser) static resource(fonts,images) caching. It’s a simple change on our BFF response cache headers and will surely save network transmission time for those resources, so we gave it an ROI score of 5.
Allow response compression — some of our APIs return 1MB of raw text data, and it might take 0.2–2 seconds to load depending on your VPN and internet connection speed. GZIP compression can make it 90% smaller. As all our web frameworks support it, it’s merely a header and configuration change. due to its implementation simplicity and relatively high speedup we gave it an ROI score of 5.
Client request reordering —now, that’s much trickier. As the browser limits to only 6 simultaneous requests on-air, we thought of reordering the requests to a more optimal order instead of the current business logic order. It seemed like a hard nut to crack, with some speedup to gain. We gave it an ROI score of 2. If moving to HTTP/2 was a valid option at the time, we would surely give it a higher ROI since HTTP/2 multiplexing would probably make a great impact.
Making the UX independent of the API responses. Meaning — make those APIs asynchronous and draw a responsive UX without loader experience. Only use loader when the data is absolutely necessary at the point it’s needed, and not ahead of time on the page load. This seemed to us as a UI/UX redesign with high effort, so we gave it an ROI score of 1.

API speed up

Allow internal connectivity between our BFF and our BE. As they’re both services of the same Kubernetes namespace at the same cluster, we can allow them to communicate without another (slowing) hop. It seemed a mere configuration fix that could save us a few hundred milliseconds per API, so we gave it an ROI score of 4.
Refactoring the slow APIs service layer and DAL invocations such that independent long-running queries would run in parallel and not in serial. As we had great test coverage for this code, we weren’t too worried about refactoring. We thought it would give a nice speedup in return, so we gave it an ROI score of 3.

DB queries speedup

Generally speaking, the application inherited an entity relations(ER) model and a table schema from a previous system implementation. We were certain that some relationships weren’t defined in an optimal way and could be revisited. So, the first suggestion was — hey, why don’t we review the entire ER model and redesign it in an optimal way? Including Tables, Primary Keys, Foreign Keys, and Indices. But then, when we began estimating the efforts, we realized that we have more than 300 tables defined. Dozens of them might need redesign. It seemed like a never-ending effort so we gave it an ROI score of 1.
Specifically speaking, the two slowest APIs were querying data from approximately 10 tables, some of them full of data, processing it, and returning an answer to be presented on the user’s dashboard page. Both APIs return a limited number of non-specific records to be displayed on the client-side but retrieved all the candidate records in order to return a few. It made sense to us to limit the amount of data being processed on the server logic, using Spring data’s pagination support, and only if we don’t have enough records to fill the dashboard page, we should retrieve more records. As we had good test coverage, covering all of the application’s layers while running an actual Oracle database container, The effort and risk in such change didn’t seem too high, but the speedup seemed promising because the queries could end much sooner. So, we gave it an ROI score of 4

Now that we’ve reviewed the possible solutions, we’re ready for the implementation step.

Step 4: Don’t Stop ’Til You Get Enough

Finally, to the fun part! In this step, we implemented our solutions in an iterative manner, until we were satisfied. We implemented solutions in a descending ROI order with hopes in our hearts to stop before it gets too nasty. When did we stop? When we made the home page load in less than 3 seconds. This happened only after implementing all the solutions ranked with ROI ≥3. Only when we got the following client network diagram we could be satisfied:

The green arrow and rectangles mark our 3-second borderline which is now finally seen from below.

Performance — A never-ending story

This blog post sums up the true story of performance optimization made for an AT&T internally-developed portal. But, as I hope you understood from my experience, performance isn’t an ordinary feature that has a beginning and an ending. It’s a way of thinking that should always be in a developer’s state of mind. As new code, features, and requirements pile up in our system, such performance optimization processes are expected to happen more often.

Thank you for reading and I hope you enjoyed my journey of performance optimization.