The great world of open Web standards

Some old and grumpy web development veterans remember the good, old days of Internet Explorer 6 hegemony. Fortunately this dark era is over and now we live in the shiny world of open web standards, right?

Without any doubt you had to hear about that super hot trendy HTML5 thing. It’s probably the biggest buzzword in the history of the web development and means everything. And that’s the source of nearly all problems. Let’s just think what HTML5 really means.

The two standards

Long time ago the situation in web standards was clear: if something wasn’t recommended by W3C, then it wasn’t a standard. But then W3C started to believe in XHTML 2 fiction and WHATWG was born. Now we have two standards:

  • HTML Living Standard, being constantly improved by WHATWG,
  • HTML 5.x, described by MDN as “HTML Living Standard snapshots”, created by W3C; the current version is HTML 5.1.

What’s the difference between these two standards? The specifications themselves try to explain it shortly (and it’s funny to see that history from WHATWG’s perspective is slightly different than the same history from W3C’s perspective). To keep it short: two organisations have “different goals” and that’s the source of all abomination that arose in web standards…

But that’s only the ideological difference, the more practical one is more trivial: HTML Living Standard implements more features than HTML 5.x and is constantly updated to be synchronised with implementations in browsers. On the other hand HTML 5.x includes only “stable” features, with more than one implementation and a “frozen” API. This is clearly visible when we start looking for a definition of the <dialog> element in HTML 5.1 specification — it’s just not there. Meanwhile finding it in WHATWG specification shouldn’t be especially difficult. However the <dialog> element is present in HTML 5.2 (which is currently in Working Draft state) — probably because it’s still a standard being developed and before freezing it, new implementations (in Firefox and/or Edge) could be created.

TL;DR If you want to know all the new, hot features, you want to read HTML Living Standard; if you want to use “safe” HTML, you want to read HTML 5.x specification.

The tale of two validators

If we have two versions of a standard, we could expect that there is a possibility to create HTML file that is valid HTML and invalid HTML 5. Fortunately there are also two validators which could help us prove our theory: one for HTML Living Standard and one for HTML 5.x. Now all we have to do is just find some HTML element that differs between those two standards. The best candidate, for me, is… <main>.

According to the HTML 5.1 specification:

The main element represents the main content of the <{body]> of a document or application.
[…]
The main element is not suitable for use to identify the main content areas of sub sections of a document or application.

According to the HTML Living Standard:

The main element can be used as a container for the dominant contents of another element. It represents its children.
[…]
There is no restriction as to the number of main elements in a document. Indeed, there are many cases where it would make sense to have multiple main elements. For example, a page with multiple article elements might need to indicate the dominant contents of each such element.

It’s clear that the differences between these two versions of <main> are so big that they could be considered two distinctive elements!

So if our theory is right, page with two or more <main> elements will be valid HTML and invalid HTML 5. So let’s create such a page and check it! Some tests in both of validators (W3C’s results and WHATWG’s results) and… yay, we were right! We’ve just created valid HTML yet invalid HTML 5.

So what?

Funny fact, isn’t it? Well, if the difference is just in the allowed number of <main> elements then everything is right, right?

Not exactly.

If we look again into WHATWG’s and W3C’s specifications and compare the two definitions, we will find that one, very important, thing is missing from HTML Living Standard: element’s implicit ARIA role.

What does it do? Well, we must remember that every page is represented by the browser as a tree — DOM. But DOM is not the only tree that is representing a page. There is also something called accessibility tree, that contains all information about how such element should be presented to the assistive technology. If DOM says “hey, look at my sexy <a> element!”, the accessibility tree whispers “Don’t trust him — it’s a link, not some ‘<a> element‘’”. And the accessibility tree knows it, because every tag in HTML is connected with default role. How mappings really looks like, you can check in ARIA in HTML specification.

So the real issue now is: are implicit ARIA roles implemented for elements based on their description from W3C’s specification or from WHATWG’s specification? Let’s check it, just scroll that specification a little, find <main> and…

main role=main

According to the description of main role, it’s a “main content of a document”. It starts to seem pretty bad to HTML Living Standard, doesn’t it?

But don’t be so fast! It’s just a specification — we should check it in a real world scenario. We can use Accessibility Inspector in Chrome devtools and test our findings on our valid/invalid HTML file.

Ok, so we ended up with two “main” regions. That’s a real problem. Such interpretation of the <main> element hinders accessibility in a spectacular way (“JAWS, get me to the main content of the page… Wait, JAWS, why do you say that there are MANY MAIN CONTENTS?!!!”).

Solution?

Actually the solution would be very, very simple: let’s change the HTML Living Standard and restrict use of <main> only to the main content of the document… but it probably isn’t going to happen. The other viable solution is to change implicit ARIA role and treat <main> element as a main content of a document only if it’s a direct descendant of <body> (very similar cases are with <header> or <footer>) — but I’m pretty sure that it’s also not going to happen. So the real solution is just to use the <main> element once at a page and pretend that WHATWG’s definition doesn’t exist at all.

The source of the whole issue is of course the fact that now we have two HTML standards. They are compatible in most cases, but not in all — especially if we consider accessibility or semantics, e.g. W3C aligned <address> definition to reality and wants to port it to WHATWG’s version (as far without success) or W3C allowed headings in fieldset > legend. While W3C is improving semantics and accessibility of existing features, WHATWG seems to be busy introducing new things.

Here’s the difference that could tell us which specification we should be interested in! If you are a browser vendor or an early adopter, loving experiments, then go with HTML Living Standard. If you are a normal web developer or are aware of accessibility issues, then HTML 5.x is your safe bet.

In the ideal world we’d have one HTML standard or two, but divided into sensible parts: WHATWG would be working on “HTML infrastructure” (DOM, technical details of implementations) and W3C — on semantics and accessibility. But we’re not in the ideal world, sorry.

So… who is starting to miss the good, old days of Internet Explorer 6?

Show your support

Clapping shows how much you appreciated Tomasz Jakut’s story.