Why Are Web Pages So Slow?


We have internet connections that can download megabytes every second connected to machines that can execute billions of instructions in the same amount of time. Yet the most popular web pages take a noticeable amount of time to load.

Above is a simple table of load times in seconds for three very popular websites. The first blue “Raw” number blindly loads everything the website wants to serve. The green “Host block” measurement shows what happens when you don’t load resources from common ad and tracking servers (shortened to ATS for the rest of this article.) The third yellow “JS disabled” timing doesn’t load or execute any JavaScript. While disabling JS has the effect of nuking virtually all ads and tracking, it also breaks most modern sites.

Here are some nitty-gritty details about what this chart captures. If your eyes start to glaze over at the numbers, please feel free to skip to the next section.

Forbes quarter-minute raw load time was spent in part downloading 273 files totaling 4.67MB. Blocking ATS suppressed 22 primary files, permitting 168 to load, totaling 2.4MB. This means that those 22 blocked files would have loaded an additional 83 files, for a combined bandwidth savings of 2.27MB–49%. Disabling JS entirely loaded only 122 files totaling 1.53MB (two-thirds less than the raw total.)

Stackoverflow’s numbers are much less embarassing across the board. The 28 files in the raw front page totaled 866K. Blocking four ATS resources also prevented loading five additional files, which dropped the size slightly to 786K–9% less. Much more impressively, preventing this JS execution reduced the total page load time by 70%. This example serves to highlight the latency penalty for putting your website’s responsiveness at the mercy of some else’s code and infrastructure. With JS disabled, the page loaded 11 files totaling 573K.

Thesaurus.com’s page design demonstrates one of the common failure modes for crappy web design when network resources are unreliable for whatever reason. The raw page contains 139 files totaling 5.14MB. Blocking nine ATS resources drops the file count to 68, a 71% reduction that saves 1.76MB. But because some of the web page loading stalls until those non-essential file load scripts time out, the page actually takes nearly twice as long to load. Nuking JS entirely reduces the page to 26 files for 1.35MB. This slimmed down version of the page loads almost instantly, and works great.

A brief history of the web

How you experience any network-based resource boils down to two measurements: bandwidth and latency. Every asset a web page tries to load (html, image, css, js, font, flash, etc.) is at the mercy of the bandwidth of your own pipe and of the server’s bandwidth and responsiveness, or latency.

In the toddler days of the web, it was all about bandwidth. A fancy $200 modem could sustain a download speed of around 3KB/sec. Pages were generally some simple HTML and a few modest images. Downloading Trey and Matt’s 20 year-old classic 53MB 640×480 five minute movie took five hours, assuming you downloaded absolutely nothing else the entire time. Web design was limited by this dribble of data.

As ADSL and cable modems started to take off, you could download the same five minute movie in two minutes. This left plenty of leftover bandwidth for more creative and elaborate web pages. Before long, hitting a server’s index.html might pull in a couple JavaScript libraries, and maybe some CSS, in addition to the html and images you saw.

How did we get here

Loading CNN’s front page today (your content will be different later) downloads nine web fonts, four style sheets, over 100 images, nearly 200 JavaScript files, and it still manages to find time to serve a little html. The 330-odd files loaded by the front page amount to over 7MB. A cursory glance behind the scenes of this hot mess shows that much of it is being loaded and executed on your local machine to show you ads and to track your behavior and interests across the internet. Blocking CNN’s ATS cuts the number of file loads from 330 to 108, and the bandwidth consumed from 7MB to 2.6MB.

This modern scenario raises the second half of the networking speed equation: latency. When CNN’s homepage starts to load, it doesn’t just seamlessly pipe all 330 files to you sequentially. Instead your browser downloads, interprets, and executes those JavaScript files one at a time. As the ATS resources are executed, they in turn download more JS and ATS files, as well as transmitting data about your real time activity to whoever is watching on the other end. You can visualize this web of trackers yourself with useful tools like Firefox’s Lightbeam.

Each of the steps in the preceding paragraph takes precious milliseconds, which very quickly add up to the multiple seconds that constitute most page loads today. And thoughtless or user-hostile page design amplifies this latency by preventing you from viewing or interacting with the desired content until all those ATS scripts finish their work.

Web speed in the news

Last week’s Facebook announcement of Instant Articles coincided with a couple parallel threads of discussion of the subject of web speeds. The most common angle seems to be that slow web pages drive away users, and that a vertically integrated mobile app will be inherently faster. The more developer-centric angle is that lazy developers (that is to say, smart ones) just import some big canned JavaScript library to access one small facet of its functionality, and web pages load times pay the price.

Facebook’s prime directive is to maintain user engagement with content that permits them to track your behavior and sell ads they force you to view. All their new publishing model changes is to funnel all of that formerly third-party ad and tracking infrastruture through their own content orifice. It is a pure power grab to increase their share of the ad revenue at the expense of their web competitors while enticing users with fast and exclusive content.

Regain control

You can short-circuit this web of trackers and ad dispensers on a Mac or PC pretty easily. Several internet Good Samaritans maintain HOSTS files (click for a good primer on how to use these.) Two of the most thorough examples may be found at

http://winhelp2002.mvps.org/hosts.txt

http://someonewhocares.org/hosts/zero/hosts

WARNING: You really, really need to understand the security implications of using these before you do so. Using a custom HOSTS file basically remaps the internet behind your back. If either source is hostile or is ever compromised, websites like your bank’s can be silently redirected. Don’t trust anything on the internet, including this article. Learn for yourself, and then decide the safest course of action.

The Ghostery browser extension is another powerful weapon in your arsenal. This has the additional benefit of targeting specific ad and tracking scripts vended from domains you can’t block outright without breaking the entire internet (hi Google.)

As an added bonus, you have also closed off one of the most common vectors for malware and viruses. Malicious content ends up reaching millions of machines courtesy your favorite websites, which blindly serve you whatever files their ad networks vend. Shut off the spigot to content you never intended to load, and you are instantly safer online.

How can the web work without ad $

It’s a great question. However in the context of this discussion, it is a red herring. If any of these web pages were designed to vend ads directly from their own servers, users would see them quickly and none of the above steps would block them. But the shadowy tentacles of the beast beneath the web aren’t appearing on so many web pages just to try to get you to click on a picture of something. This vast and elaborate network of cross-site scripting is about one thing: tracking each individual user’s behavior across the entire internet. In the old model where the local website shows you an ad, and redirects you to the advertiser only when you choose to click, none of these problems exist.

So if the real question is: “how can the web work without the revenue web pages reap while silently serving up their own readership to massive data farms and the NSA?” then the answer is: shut it all down. Anybody who sincerely endorses that hellscape deserves to live with it. The rest of us should find something better to do with our time.

Until the web falls apart under the weight of monetization, or is replaced by shiny apps, you can at least be aware of what is going on beneath the surface of today’s web. Hopefully the tools and techniques above can give interested users a way to do a bit more to improve the web for themselves.

Postscript: what about my iOS devices

Your iPhone is meaningfully slower than your Mac, and would benefit even more from blocking all this garbage. Unfortunately Apple has not bothered to include the ability to apply static network mappings to iOS devices. This seems like a reasonable addition to iOS configuration profiles.

A technically adept, patient, and masochistic user can work around Apple’s omission by building a dedicated dual-ethernet PC, installing something like the open source PFSense firewall, configuring it to vend DNS to your LAN, and then setting up OpenVPN to permit your iOS devices to connect to your LAN while you’re out of the house. Sadly, a useful HOWTO for doing all of this is beyond the scope of this author’s abilities to deliver with any confidence. Really a much better plan A is to bug Apple to make this easier.


Methodology

Benchmarking the web is a nightmare scenario, and no claims are made that any particular cited numbers are reproducible. Five runs were measured in rapid succession using Safari’s Timeline recorder, per site, per configuration, always with Safari’s cache disabled. The high and low times were discarded and the remaining three were averaged. The numbers are provided to illustrate the concepts in action and to provide some very rough scope of the possible benefits of host blocking.