Why web performance isn’t about performance

Published in

The Entrepreneur Life

8 min readFeb 25, 2019

The internet has never been faster. Fiber optic cables cross the globe sending packets at 99.7% the speed of light, WiFi standards have evolved to transmit up to 3.5Gbps in the home, hyper-optimized JIT compilers accelerate Javascript powered sites to near-native performance, and world-wide CDN networks deliver cached assets faster than your 60hz monitor can refresh itself.

So why is it that Slack regularly takes 15+ seconds to load? Why are average website load times as high as 8.6 seconds? Why do I give up loading so many web pages when browsing on a phone?The internet has never been slower.

What if I told you the real reason performance is important has nothing to do with speed? Web performance is critical for one reason above all others: the longer your app takes to load, the more things can bring it to a screeching halt, and the more likely that is to happen when you least expect it.

Works on my machine

How many times have you written a query that executes in a few milliseconds only to have it take dozens of seconds or worse in production? Have you ever built a feature that doesn’t cause any harm locally, but after launch you start noticing timeouts on an E2E test platform? Have you spent at least one panic filled late night diagnosing broken functionality only to blame your hosting provider in the end? Even nastier: have you ever deployed a feature that worked fine for weeks in production after you launched, but months later it brought your system down hard when a 3rd party dependency unexpectedly slowed down or crashed? Is performance a matter of whack-a-mole where you work?

Performance failings are a sign of bad architecture.

By default, developers tend to only optimize things that are slow on their i7 Macbook Pros where everything is running on localhost, and their DB is 1% the size of production. If this is true, is the answer throttling your connection speed in Chrome Dev Tools? Do you need to generate tens of thousands of fake rows in your DB? Do you need to spin up your environment in the cloud to force realistic latency, then manually test everything? Well, maybe, but a few core perspective changes can make a huge difference:

Dependencies are evil
Async everything
Assume everything will break

Dependencies are evil

The node_modules folder (which I will now refer to as nodules due to the swelling growth it represents in modern day applications) has garnered a reputation of epic proportions, but one key fact keeps getting overlooked: any team that installs an npm package should be responsible for the install penalty, the runtime penalty, the size increase, and the security of not just the package they installed, but every sub-nodule it drags down with it AND needs to manage all future updates for these packages.

If I told you that you could write slightly cleaner JS code, but you had to:

take responsibility for 300+ packages
permanently trust 145+ contributors worldwide
slow down your installs by 10 seconds
slow down your builds permanently
manage bug fixes and security updates for all 300+ packages indefinitely

Would you do it?

Statistically, it’s pretty likely you already have:

Avoiding pulling in dangerous or large dependencies can be easy:

Don’t implement the feature
There may be no dumb questions, but there are definitely dumb features. If you are asked to build a 3d-rendered rotating carousel of images as a minor side-feature, try pushing back. You can cite stability, security, performance, and maintenance costs as negatives, and offer a more minimal alternative in its place.
Write it yourself
If you could write it in 3 lines, is it worth giving someone else leftPad-esque levels of control over you? Writing small helper functions is a practice that can help you design testable and readable code in the first place, and you get to build the API you want, not just pick from what’s off the shelf.
Search for smaller, simpler versions
Using the first nodule that comes to mind may not be the best way forward for you. Sure, moment is popular, but date-fns is far smaller. Similarly, preact is a fraction of the size of react

That’s not to say every single project should be written from scratch, and no large framework is worthwhile, but how much trust should you choose to place on hundreds of contributors you don’t know? Besides, even with the vastest virtual DOM diffing algorithms around, Vanilla JS is still the fastest framework out there.

Async Everything

Every time you fire up a web browser and navigate to a “typical” web app, dozens of connections are made to servers across the globe, hundreds of individual files are shuttled around network switches and vast lengths of cabling, your data is shipped off to a half dozen companies even if you don’t like cookies, and thousands of lines of code are funneled through high speed processors at billions of operations per second.

Imagine if every one of those operations had to happen one. after. another. Including latency and adding up all the machines involved there could be minutes of activity to serve up a single web page. Intuitively doing every operation sequentially feels like a huge waste, but it may seem daunting to pick apart something you have already built.

One of the easiest ways to slow down a website is the unassuming <script> tag. By default, when the browser arrives at one it slams on the brakes, queues the elevator music, snail-mails the server containing the script, then contemplates picking up a new hobby while it waits for MBs of Javascript to be delivered.

The load-time concurrency problem is actually fairly easy to solve.

Consider removing the script.
Sure, that shiny social like plugin looks great, but have you weighed its value against the security and performance implications involved? Maybe you could roll your own or drop it altogether? Remember: a blocking script tag could make your entire site take 30+ seconds to load if someone else’s server is slow
Add async or defer to the script tag.
async tells the browser it can continue rendering HTML while it waits, defer tells the browser to finish rendering HTML first + guarantees the order your script tags will be executed.
Programatically load the resource only as needed
If the script is only required on a handful of pages, consider loading it dynamically. Tools like webpack can help you do this automatically, or you can roll your own by using document.createElement(‘script’) and adding an onload function so you know when it’s finished loading

Assume everything will break

Just because you’re using cloud architecture doesn’t mean your site is invulnerable. When looking at uptime SLAs on the order of four 9’s it’s easy to think to yourself “that’s basically 100%”. A typical uptime of 99.95% equates to almost four and a half hours of downtime per year. Even if that seems small enough to you, it is extremely likely that something will happen at least yearly, and that number likely doesn’t include “service degradation” where the servers were technically up even though latency and packet loss were high.

Since you almost certainly don’t manage and host every single part of your application on the same zone of the same provider, you also get to add in downtime for all of your 3rd party dependencies. Do you use an email provider? Add another couple hours. Maps api? Add another hour or two. Social sign on? Account for the inevitable. If you have a handful of external partners involved on your site, you probably have an outage a month to deal with, if not more.

(numbers are all calculated based on a hypothetical 99.95% uptime)

Designing high-availability applications requires spreading your infrastructure in as many zones as possible — but without building in explicit failure tolerance, all this does is increase the likelihood that something is broken at any given time. To build an application that is more stable than the platform it is built on, we must assume EVERYTHING will go down perhaps even regularly.

Most dev ops engineers will take a server out of round robin load balancing if it stops responding, but how many environments are set up to retry the original failed requests? You might have retries configured for your mail service, but if the tries aren’t spaced out long enough and the service is down for a few hours, will emails still get sent? You might have a few copies of your app running on multiple servers, but a single load balancer is still a single point of failure to many. Some of the most common devops issues could be mitigated by applying some core principles:

Every piece of infrastructure should be on more than 1 zone
No excuses for treating load balancers as acceptable points of failure
Since everything will fail, (safe) requests need to be automatically retried

This principle is not only relevant to devops teams, it should have a role in your application logic as well; if every http call or S3 file upload not only could fail, but likely will fail, how should we design differently?

Every request not ESSENTIAL to your site being online should be moved to a job queue
Queued jobs need to be retried with exponential backoff, need to be assigned priority, and need to have monitoring built in
Every function call not CRITICAL to your core business should be wrapped in a try/catch

Performance is your responsibility

Performance is not about micro-optimizations done by supersmart code geniuses. Unless you have a “performance assistant” who follows you around and re-writes all of your code, it is your job to build things right the first time, or at least fix things as you go. Paying attention to performance problems and using them as an indicator for fragile parts of your system can pay huge dividends. Guaranteeing 3rd parties are not able to take your site down, queueing up non-essential actions, considering costs before adding features, and writing more code yourself could substantially improve both site reliability and load times.