Investigating and Optimizing Client Side API + Web Page Latencies to near Google speed

I think the best way to really get into something is through a real world example. So, I would like to share a personal experience of mine ( A war story for those more dramatically inclined) where I had to optimize our api and webpage for latency.

The Exposition

Without breaking my NDA, lets just say that we have a hybrid mobile app. For those of you that are unaware of what hybrid mobile apps are, it is essentially a mobile app that renders some aspects of it self natively (i.e the code is actually running on the phone) and some aspects via a Web View (i.e a webpage cleverly made to look like its a mobile app)

Whenever the user clicks a particular ingress (fancy speak for button/link), based on certain factors that are best left untold for the sake of my continued employment, we would make a HTTPS call to one of our REST endpoints. (simple terms, user does something, and based on that something we call an API).

The API always returns a URL, the URL differs based on the user’s action. Once the app receives the URL it simply opens it in a Web View.

The Conflict

The API that returns a URL has a p90 of about 40-50ms (i.e 90% of the calls run under 50ms). Our webpage itself, is a very basic page with minimal UI and no fancy styling, animations etc. The entire page was designed to fit into a mobile screen without the user needing to scroll.

Despite all this, it could sometimes take up to SIX seconds to complete the API call and render the page in the web view!

What follows below is my series of investigations I performed to help bring down the latency of the flow to under a second.

Setting up a Test environment

First thing I tend to do is to setup a test app/environment of sorts to mimic prod conditions, but keep it simple enough so that I can quickly make code changes without worrying about best practices and code quality.

Only once I am able to narrow down the issue, and am reasonable sure of the changes needed, I make the changes correctly (dotting all my Is and crossing my Ts) in the production app.

My test setup was simple, The moment the app opens I get three buttons
 1) Button 1 calls my API, records the latency and presents it to me in the form of a toast
 2) Button 2 opens our URL in a web view. Here also, the app would let me know the page load start time and page load finish time in the web view.
3) Button 3 would simply render www.google.com in the same Web View so that I would have something to strive to. 
I chose google.com because their page is extremely simple and has a similar number of elements to our page.

Fixing the API

Observations

One thing I observed straight away was, in my API the first call was always extremely latent.

In the backend our API took only 40–50ms to execute, so this was very strange.

It took around 1 second for the API call to work. But this always ONLY happened for the first call. If I made a second call without killing the app, the latency fell down to 200–300ms (Even if the response was different).

Now a part of this latency is due to my api being hosted in a separate continent, for which there is no short term fix (trust me). Interestingly, if I waited 5 minutes and tried calling the api again without killing the app, the latency would once again spike up.

Conclusion

Somehow, somewhere, something was caching the network objects being used to call the API.

This was true even if the response returned was something different, which leads me to believe it was android caching some network objects instead of some layer caching the request/response (as the reduced latency was observed even for different request/response pairs).

Fix

This was simple enough to remedy, the moment user loads up the app just make a “priming” call to the api in the background with some “dud” data. This would serve to “warm up” the api, if you will.

Just had to make sure that in our backend we were able to differentiate between warm up calls and actual user calls, so as to not skew (see screw up) our metrics.

Result

Now once the user starts the app, as long as he clicks on our ingress after one second he would see the benefit of the warm up calls. And more than 99% of our users did actually take that much time to invoke our feature, so alls good.

I was able to bring down the latency from 1 second to around p90 200–300ms.

There was still further scope for improvement, as our backend was being hosted in a separate region than where the client was located (different continents). But that is a larger problem to solve, mainly due the number of downstream dependencies that would have to be migrated as well.

Fixing the Web Page

Observations

Similar to the API, first time load of the page was way more latent on the web view compared to follow up reloads within 5 minutes. This was true for both our website as well as google.com

Unlike api calls, there wasn’t really a way to warm up a web view. I did try making a GET call to our webpage using a https client and then try rendering it in a web view (I was hoping that android might be sharing the cached objects between the http client and the web view), but unfortunately this made no difference.

I could try preloading a page on a web view and keep it invisible in the background, but this did not seem like a very elegant solution. I did not want to implement this hacky approach in production, so I left this for now.

I could even try overriding the default web view caching strategy, to automatically cache the heavier elements of our page on device before the page even loads. Unfortunately the same instance of the web view is used by other teams as well and chancing the caching strategy, or caching items locally would not fly with them or the overall vision of the app to be as light as possible.

Now I hooked up my webview to chrome debugger. (https://developers.google.com/web/tools/chrome-devtools/remote-debugging/webviews)

Conclusions

This allowed me to profile my webpage and see exactly what was taking so long to load.

Here is what i found after looking at the network profile

  1. favicon.ico was latent AF (what is this even?)
  2. The loading spinner was a 1mb gif which was taking a little time (not a lot)
  3. One of our JS library was 700 something kb in size.
  4. Our web page is hosted in a separate region than our client

This gave me a starting point. To make the page load faster, I would need to reduce the page size, so as to

I then added page grading extensions, that would give my page a grade and also tell me where I can improve my page. This can be easily found as a chrome extension, but I personally used Yslow. (http://yslow.org/)

Fixes

Favicon.ico
Favicon.ico is the small image of your site that shows up on the tab of your browser. Since we were in a web view and the page can only even open in our web view, the favicon.ico file was not included.

This caused the web view to still try and make a call to favicon.ico, and since we did not have it, we returned the entire 404 page instead.

The Fix was simple, adding the smallest possible favicon.ico file possible. A single pixel. You can easily find this image using google search or even create your own.

Loading Spinner
We had a very simple loading spinner. We did not have to use a gif. Simply used CSS to render the spinning loading wheel with a gradient. Easy Peasy.

For those of you unfamiliar with CSS, here is a good starting point: https://www.w3schools.com/howto/howto_css_loader.asp

JS Library
The JS library we used was meant to be used by large pages (imagine infinite scroll type pages) and was completely overkill for our page. So we removed the library and use CSS to mimic the look and feel ourselves for the limited number of elements in our page.

Page Grading
Did all the suggestions given by ySlow. This included basic things like moving css to top, JS to bottom, yada yada. Since the page was so simple it was easy to implement all of their suggestions till we met all their grading criteria and got a score of 100/100 (even google.com does not have that, though to be fair, their use case is more complicated)

Minification
As the last step minified our page’s JS to make it even smaller. It was already pretty small, but minification comes at a very low cost, so why not.

CDN
One simple fix the to the cross-region issue is to front the page on a CDN. I haven’t really gotten around to this step yet, but is something I do mean to take up. AWS CloudFront seems really promising, especially because their edge servers are exactly where our core user base is located.

Result

One single thing did not make a huge impact, but all the small things added up and we were able to bring down the COLD start latency of the webpage from 4–5 seconds to 1 second.

Warm page load latencies are below 500ms (on par with google.com).

The cold page load time can especially be improved once we setup the CDN (or so I believe).

The Climax

With the above two changes the entire flow started executing in under 1.5ms (worse case) from the whopping 6 seconds.

The entire page size was brought down to 17kb, down from multiple mbs.

There is still scope for further improvement via the use of CDN, but thats for later time.