HTML, JS, And State: A Challenging Way To Look At Web Performance 🔊
HTML byte size, streaming, GZip and caching
Some people complain HTML rendering (A.K.A. "server-side" rendering) requires the browser to download more bytes from the server. That is sometimes posed as a big problem and used to justify returning JSON with domain-specific definitions and write code to re-create the HTML in the browser.
HTML uses repeated tags and attributes instead of square braces for lists. Of course, that will generate more raw bytes for the browser to download when the result is a huge list:
[…] Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. […]
— Donald Knuth on "Structured Programming with go to Statements", page 268.
Writing Front-End code to optimize HTML byte size is probably not worth the effort.
Also, there's GZip.
The GZip algorithm (which uses DEFLATE) implements Huffman coding. Huffman coding can be more performant when there’s duplication in the string. That means GZip is more likely to yield a better result for duplicated HTML tags, than for a custom domain-specific data structure written in JSON.
If the "deflate" option is applied, using this site, to the lines 12–20 of Example 1, it results in an output of 112 bytes out of 150, which is a reduction of -25.3%. There are too many tokens and characters that don't repeat, so GZip can't efficiently compress the data.
If the "deflate" option is applied, using this site, to the Example 2, it results in an output of 118 bytes out of 766, which is a reduction of -84.5%. The compression is incredibly efficient because there are a lot of characters that do repeat.
Due to how GZip works, it will yield a better compression result for HTML than for a custom domain-specific data structure written in JSON.
HTML is streamable.
According to the specification, HTML parsing works like a state machine. It inputs to the parser each character of the markup from left to right. Each time the input is received, the parser changes the state.
Let’s say the parser is processing a button tag. The first step is to process the Less-than sign. After that, it changes the internal state to process the next characters
n, which will represent the tag name, until it finds another Greater-than sign character. After that, the parser will change the internal state to process the text "Do something". Then, it will find another Less-than sign (
<), a slash, the tag name
n, and a final Greater-than sign to mark the end of the tag.
Even before closing the tag, the browser already knows enough to render the button. Even if that means rendering a basic UI.
HTML is streamable by default. JSON is not.
In that case, if the server adds new properties to the response, it can break the website for some browsers that may have that request cached.
When the browser does a GET request to the server, by default it caches the response based on the URL and query string. Subsequent requests that use the same URL and query string might force the browser to use the cached response instead of downloading the new content from the server.
There are also many misconfigured corporate proxies and open wifis out there outside your control. They can cache the Content-Types
Also, browsers tend to invalidate the cache of
Just because everything works in your local environment, that doesn’t mean it will work when somebody else tries to access your website in production.
At this point, you might start to hear excuses like "it works on my machine" or "just update the cache".
How can you avoid these problems?
A very common workaround to this issue is to append a unique identifier to the URL, such as the latest commit hash or a date timestamp. This way, the browser will never cache the response for that URL.
This is commonly known as "Cache Busting".
However, that still doesn't work for intermediary Proxies. You still want the application to remain in one piece even in unexpected environments.
What's the alternative?
If you return HTML from the server and a browser receives the cached response, it will just render an old but working state of the website. When the cached response is refreshed in the client, the website will be rendered with the new content.
There will never be a broken state.
If you return HTML and the browser has it cached, it will always render a User Interface that works.
This is extremely powerful.
If you don't write code to recreate the HTML in the browser, you will never have to spend the time to fix the problems of rendering broken parts of the website. The browser will always render something usable despite it being cached or not.
Besides, if you're developing the website incrementally, you can deploy it with their default caching behavior in the first "step" of delivery. Later, after MVP, you can decide to improve the caching responsiveness using some of the techniques described above.
If you do this, you'll never have to deal with angry customers that can't use the website anymore just because you've added a new property to your server-side JSON.
If you use HTML efficiently, there will be less friction, more things done and better perceived performance.
HTML can be bigger than JSON. That has no meaningful impact when your bottleneck is probably somewhere else.
HTML is streamable so that the browser can start rendering the website without having to wait for the download to finish.
If the server returns HTML instead of JSON, the browser will never render a broken page when the contents are cached, it will always render an older but working version of it.
Performance is what the user perceives, not the technical details of what you believe it is.
What the user perceives is a page that is fast and works.
Don't waste your time to bikeshed on aspects of performance that make no difference. Look at web performance and maintainability problems for what they really are.
Understand the tradeoffs.
Don't try to reinvent the wheel and prematurely optimize things you will have to get back and fix later.
Decisions that can have huge consequences.