The Terrible State of HTML

We need to go back to the basics.

William Belk
Geek Culture
Published in
5 min readFeb 26, 2023

--

A few months ago, I launched a free page analysis tool called Page Doctor.

The inspiration for Page Doctor came from feeling like Google PageSpeed Insights and other tools were not giving me all of the information I wanted in a ‘first-pass’ analysis tool — so I did what any naive idiot would do—I built my own!

So far Page Doctor has run over 10,000 tests. It’s been great—obviously it’s not perfect and continues to evolve, but overall it’s been a fun and enlightening project.

One feature that I added is an automated W3C Validator check on all pages. I added this because the HTML on some tested pages is so bad that it breaks my page evaluation parsers. There are a few cases that I just can’t get my head around so far. This is all fun learning for me, as it all forces me to go back to basics and realize that even I, master of the world’s 823rd most popular page analysis tool, forget to double check my foundational code sometimes.

Below are a few things I’ve been noticing so far from reading too many Page Doctor reports.

1. Do developers even care about valid HTML anymore?

The majority of the thousands of reports run through Page Doctor contain flagrant violations of HTML standards. Thankfully, HTML rendering engines have “Quirks Mode” to show our pages when they contain invalid HTML.

As a general rule, we need to be better about using the W3C Validator. It’s very helpful. It does often list errors that are borderline not fixable, but generally it’s a fantastic place to start when optimizing.

Invalid HTML can cause all kinds of rendering issues, like rendering imperfections and jumpy scrolling. Invalid HTML can also mean that pages are harder to parse. Pages that are harder to parse can impact SEO, page speed—all kinds of potentially mysterious issues.

The easiest way to check for invalid HTML is W3C Validtor. As I mentioned above, it is not a perfect tool, but it is very helpful.

2. Throw in the kitchen sink with the rest of our 3k DOM elements!

What is the DOM? The DOM stands for Document Object Model. It is basically an object of data that represents everything the browser needs to render a web page.

What is a DOM element? A DOM element is a node, or element, or container, in the DOM. Simply, we can think of HTML tags as DOM elements: <div></div> <p></p> <br> <script></script>

All web page rendering relies on the DOM. Web page rendering performance is affected by the number of DOM nodes.

Every time a page element is drawn (rendered), resized, searched for, animated, or scrolled, the entire DOM of elements may need to be looped over and recalculated. This is extremely expensive for browser resources.

The more DOM elements we have, the more work the browser needs to do. This is a big deal. If a visitor has fifteen Chrome tabs open, a slow connection, or a low-power mobile device, it only get worse.

What I’m seeing in thousands of Page Doctor reports are incredible numbers of DOM elements compared to the simplicity of the pages. 2,500, 4,000, sometimes 10,000 DOM elements! It’s wild to see. I think this explosion of DOM elements mainly comes from HTML/CSS frameworks (like Bootstrap and React Admin) or from third party applications and plugins. I talk more about this in #4 with a real-world example.

Google PageSpeed Insights recommends less than 1,500 DOM elements. I think that’s quite high for a standard page like e-commerce. Less than 800 DOM elements is probably ideal.

3. I’d like to order more page requests, please!

Page requests are expensive by definition (both in terms of browser resources and time risk).

A request gets data from a url and makes it available to the browser. A request opens a browser connection, receives data, then closes the connection and passes its contents to the DOM or other browser controllers to either parse or execute.

The more page requests we make, we expose ourselves to natural network and server latency at every single step.

  • What if the DNS cache has been flushed? It take takes longer to route to the server address.
  • What if the DNS layer has latency or is overloaded? It take takes longer to route to the server address.
  • What if the CDN network has latency or is overloaded? The CDN will fulfill the request much more slowly.
  • What if the CDN cache has been flushed or expired? The origin server must respond to the request in order to repopulate the CDN cache.
  • What if the origin server that serves the content or responds to the request is slow or overloaded? Now we are at the slowest possible level of the potential request tree.

If we keep the above in mind, and consider that some of us are executing 200–400 requests to load a simple page—it’s no wonder some of our pages our slow!

More things we need to look out for are blocking script tags and css <link> files. These are both requests that cause the browser to stop everything it is doing until the request finishes before the browser will take another step.

Just rethinking the number of requests we are making, and how those requests are made, can result in huge page speed benefits—and even the validity of our HTML, as we see below in #4.

4. NEVER trust frameworks or third-party app developers.

As mentioned briefly in #2 and #3 above, our pages can get out of control with many DOM elements and page requests.

It is possible that third-party apps, libraries, frameworks, and plugins can be the trojan horse that we never considered—just because it was easy to use or install.

Third-party apps and plugins can flood our pages with background requests and DOM elements.

Let’s use a practical example comparing Yotpo for Shopify, with Rapid Reviews for Shopify. I built Rapid Reviews to be the fastest product reviews app for Shopify.

Yotpo

On this page, Yotpo injects 3,100 DOM elements with 30 page requests, totaling 541K of compressed data transferred!! Horrific.

Rapid Reviews

On this page, Rapid Reviews injects just 324 DOM elements with 1 single page request and just 12K of data transferred. Tremendous!

So in this case, Rapid Reviews has a more than 10X smaller impact on the page in every way—with more product review features available on the page, like deep search for reviews and questions.

Be careful with your ‘easy’ framework, theme kit, or third-party application. These tools can also introduce invalid HTML elements on the page. As I mentioned earlier about reviewing thousands of Page Doctor reports—some of these third party applications even introduce fatal HTML errors into popular sites we all know. This can be very concerning and is worth additional care and consideration as we evaluate vendors.

I hope this was helpful. Now get out there and validate or run your pages through Page Doctor!

I build apps for Shopify like Rapid Reviews and Image Sitemap. I built Page Doctor, the practical page speed and SEO tool. Follow me on twitter.

--

--