187. HTML semantics, Web Components, W3C and WHATWG, HTML5 Doctor, Prince XML, Web Monetization

August 7th, 2019

Published in

Web Standards

34 min readAug 7, 2019

Vadim and Bruce.

HTML semantics use cases, Web Components, W3C and WHATWG, HTML5 Doctor, accessible PDF with Prince XML, Web Monetization API.

Vadim Makeev
Bruce Lawson

Topics

00:09:36 HTML semantics use cases
00:17:41 Web Components and semantics
00:22:03 W3C and WHATWG
00:28:42 HTML5 Doctor
00:35:50 Accessible PDF with Prince XML
00:49:22 Web Monetization API

Listen: iTunes, VK, Yandex.Music, Spotify, YouTube, SoundCloud, RSS.

Read: Twitter, VK, Facebook, Telegram.

Support: Patreon, Yandex.Money, PayPal.

Discuss in Slack.

Vadim: Hello, you’re listening to 187th special episode of Web Standard podcast. I’m your host Vadim Makeev from HTML Academy, and today we have a very special guest, good friend of mine Bruce Lawson. Hello Bruce, and tell us a bit about yourself.

Bruce: Hello everybody. I’m Bruce Lawson, handsome, but middle aged web standards guy in the U.K., United Kingdom. I’m sitting here, it’s a lovely and sunny in my large palace, looking out on my garden with a small dog sitting on my feet. Vadim and I go way back. We were both in the developer relations teams at Opera when Opera was a thing, and we coincided at many conferences, and events in the last 10 years, and long may we continue to do so.

Vadim: Yeah. One of the reasons I asked Bruce to join us today is, interesting fact, last year I gave a talk called Semantics for Cynics. I gave it in Russian a few times, and I even gave it once at Frontiers Jam Session, so I think there should be a video of me trying to speak English at Frontiers. I also saw Bruce’s article published some time ago, I think it was December last year? Yeah.

Bruce: Something like that.

Vadim: And, it surprised me, because we were expressing the same ideas, and talking about almost the same thing: semantics in HTML, and the reason it still exists, and the reasons we should care about it as developers. I was surprised to see that we shared almost the same ideas and even sometimes similar pictures from Apple presentation. I’m going to ask you first, what was the reason for you to write this article? Just to summarize why you think it’s still important. And then, I’ll share my own view. Maybe it will be unnecessary, because you have the same-

Bruce: I can tell what you’re doing, Vadim. You’re trying to trap me into confessing that I copied your talk, and then you’re going to sue me for millions. I know how it works. I think it’s not surprising that we have similar narrative in our talks, given our shared history and our shared interests in the open web and open to everybody, regardless of disability, et cetera. Insert your Tim Berners-Lee quote of choice here. But also of course, there are great reasons to care about semantics, but there’s not actually that many reasons to care about semantics, and thus I think we both sort of came up with the same five or six reasons, which are: accessibility, SEO, future proofing so that when a new device like the Apple Watch turns up your site should already work well, if you have used correct HTML semantics. But yeah, I mean, I think 10 years ago telling everybody to care about semantics was largely ideological. It was hard to demonstrate a practical use.

Vadim: That’s how I literally started my talk. I showed slides of my talk, I gave probably 10 years go or something like this, and I said yeah, we used to see HTML5 as a “new hope” in a Star Wars way, like new hope that something is going to come and save us, finally. And then, I say it failed us, because it restarted the web standardization process, it gave a new boost to it. But then, nothing really happened in developers minds. Nothing really happened for the web platform in general. We stopped caring about semantics shortly after it happened. Shiny, new HTML5 text, yeah, whatever, but we still have no idea what’s the difference between section and article. So, it wasn’t big of an impact, from my perspective. One of the main ideas of my talk was you can be cynical about semantics and use it only for practical reasons.

Bruce: Yeah. Well, you say people stopped caring about semantics. I’m not sure whether that’s true. I think the people who never cared continued not to care, and the people who did care continued to care, and as new people came on board it got a lot easier to persuade them to use semantics. So, for example, the fact that the header elements built in the role of banner, so people didn’t have to add the extra area, it made it a lot easier to say to people look, if you just use header, main, and nav, and footer you’re already doing a lot of stuff better without you even having to try. I think the problem was that some of the elements are not properly, or not adequately specified. You saying what’s the difference between <section> and <article>, we’ve been asked that hundreds of times on HTML Doctor, and this conference season I’ve been asked it a few questions.

Bruce: The answer, by the way, podcast listeners, is just don’t use <section>. Just don’t bother using it. It has not real semantic value, whereas article does. And, I think that’s because they got dreamed up very early on in the genesis of web applications 1.0, which subsequently got renamed HTML, and subsequently HTML Living Standard. And then, they kind of fossilized, because I think that the cool kids who wrote the spec were more interested in groovy JavaScript APIs, and not actually that interested in semantic elements to mark up chunks of content. So, ignore the section element. The main element is an excellent thing, and Steve Faulkner had to fight quite hard to get that added to the spec.

Vadim: Oh yeah. I remember, it was missing from WHATWG version for a while.

Bruce: For a long time, because he said there was no use for it. And, in fact, to be fair, I said there was no use to it, and luckily Steve Faulkner persuaded me that I was wrong and he was right, so now I’m an ardent supporter of the main element, because it allows somebody using assistive technologies to hit a button and jump straight to that main content, meaning that you don’t need to have a skip link, and to somebody who doesn’t need it then it doesn’t make the experience any different or worse.

Vadim: There’s one thing that really helped me to understand the meaning of <main> element. Once in a while, when internet connection is bad, browser fails to load style sheets for the page, and sometimes I see a page without style, so it takes a few minutes for browser to give up and just show HTML. And sometimes, you have to scroll five, six, seven screens before you get to the actual content of the page, because navigation, banners, some side bars, and nonsense like this, everything that surrounds the actual content of a page. So, when you see it, you start to realize if you would go through this link-by-link, like paragraph-by-paragraph, list-by-list, you wouldn’t get to the content, because you would forget what you were looking for. So, if you have at least jump to links, or better, both main and jump to content links you would really help, because with styles it’s all compact, it’s all hidden somewhere in drop down menus and things like that. But, when you see it without styles you can actually experience it as assistive technology user would.

Bruce: Yeah. Yes. And, a little while ago I did a webinar with Léonie Watson, and I asked her to show the viewers how she is a screen reader user, navigates and perceives the web. I asked her to use my website, not because I’m holding it up as an exemplar of brilliance, but because I didn’t want to name and shame anybody, perfectly happy to name and shame myself. Maybe you can post the link, I edited out the six minutes or something of her on my site showing how header, footer, nav and main practically helps her to navigate. It’s really interesting to see in a real live human being actually using the web with a screen reader, and the difference between the page that’s got semantic mark up versus a page that doesn’t, it really resonates. I think you have to be a special kind of hard hearted person not to care when you’ve seen a real human struggling.

Vadim: Oh yeah, it changes the way you think about it. It’s not just 1% of users, it’s real people.

Bruce: Exactly. It’s easy to forget that there are real human beings operating the browsers and consume our content.

Vadim: It’s like hating IE11. You’re not hating machine, or software, you’re hating actual people.

Bruce: Exactly. And, you say though it’s not just 1% of users. 1% doesn’t sound very much when you say 1%, but if you think 1% of seven billion people, that’s a lot of people.

HTML semantics use cases

Vadim: Right. So, we have this complete understanding of this accessibility part of things, I think, from my perspective accessibility is one of the main reasons. Maybe for some people it’s the only reason left to care about semantics, and I get it. It’s important enough to be one of the main reasons to care about semantics. I think it’s much easier to talk about semantics. I care about it in many ways, and I think it’s a good thing for many applications, but I chose this way to explain semantics, and importance of semantics in HTML via accessibility and practice. So, that’s one thing. But, what are the other ways of pursuing semantics in HTML? What would be a good reason to do this part from semantics?

Bruce: Well, it’s funny, because you said you chose to emphasize accessibility. I actually choose to de emphasize accessibility. I mean, I don’t keep quiet about it, because I have accessibility needs of my own personally, but I find for one reason or another, if accessibility isn’t enough of a draw for the person listening to me, I point out things like the Apple Watch. The fact that I have article elements, and I have figures, and fig caption, means that the Apple Watch just apparently shows my site properly. Obviously, I’m not beautiful or rich enough to be allowed to wear an Apple Watch, so I haven’t checked it out for myself.

Vadim: I did.

Bruce: Does it work okay?

Vadim: Yeah.

Bruce: Exactly. It’s good that I have a friend who’s beautiful and rich, and is allowed to have an Apple Watch. But, that’s good. The idea… I wrote that mark up in 2008 when I was researching how to use HTML5, it was still in spec, very much in specification, and I wanted to see whether the specification made sense to me, and I gave feedback to the working group saying this doesn’t adequately define it, I don’t know what this is supposed to do, et cetera. But I wrote that mark up 11 years ago, and last year the new version of Watch OS came out, and my site just worked. Who knows what’s going to come out in 10 years time. But I acknowledge that a lot of web developers have no expectation their sites are going to be online in 10 years time, or they expect they will, but they don’t care, because they’re contractors.

Bruce: SEO, everybody cares about SEO. When I’m at a conference, and I say put your hand up if your boss tells you to make sure this website can’t be found by Google, or Bing, or Yandex. Nobody puts their hand up. And, this Schema.org stuff really does help SEO. I think a couple of months ago Google produced a blog post with actual numbers showing how much extra search traffic, even conversions, you get if you’re using schema.org stuff, whether it be micro data, or JSON-LD.

Vadim: Yeah, it’s interesting that you mentioned it, because applying Schema.org, or JSON-LD, or similar things like that, like RDFa, would be so much harder for developers than just using semantic HTML elements, seriously. You have to go through the spec, I mean serious spec, not as easy to read as HTML spec. You have to find the use case for your information you have on your site, and try to fit it into categories of schema.org, and then you have to put a lot of extra attributes that are not easy to read. So, it’s much harder to implement than just HTML tags.

Bruce: Agreed. But, it generally has a different purpose, because you’re marking up micro content. I find Schema.org pretty easy to read, actually. I don’t find the HTML spec particularly easy to read, and it’s not really meant for me, it’s meant for-

Vadim: Yeah. I guess I have more experience with HTML, so it looks easier for me.

Bruce: Yeah. But, that will lead us into a discussion of HTML5 Doctor soon. But, I do find Schema pretty easy to read. But then, of course my website is a blog, and there is a Schema.org vocabulary for blog posting, and that’s thus the only one I use. But, what I always say to people, if you’re mucking around with your source code in order to add Schema.org micro data, whatever, incarnation, or syntax you particularly like, you might as well then add the HTML5 elements while you’re doing it. I don’t know how many people do, but I think they’re pretty well used. I think main is quite prevalent on the web, in the wild.

Vadim: Yep. Headings, main elements, and proper picture tags, I mean IMG tags, and alt attributes, they really serve additional hints for search engines to parse the content. But, I think at some point Google, Yandex, and other search engines they kind of gave up on mark up, and they started to analyze the actual content of the page. So, they’re not judging the value of your page based on the HTML tags only, or HTML tags first. I think they take them into account like second, or third even.

Bruce: I don’t think any of us know the magic. But, it seems to me that they all use a mixture of heuristics, HTML analysis, Schema.org, if they can find it, page rank, et cetera.

Vadim: When you have sites like twitter.com built on <div> soup, it’s really hard to rely on HTML tags only, so you have to analyze the actual content. So, these days a lot of developers not just misuse HTML tags, they just don’t use semantic ones. And, in this situation, I think search engines have to fall back on actual content analysis.

Bruce: Yes. And that <div> soup used to be <table> soup in the old days, but <div> soup still continues, and I think a lot of that is to do with the current fashion for monolithic frameworks, and looking at you React.

Vadim: But, Bruce, Bruce… React is a library.

Bruce: Ah, yeah. Yeah. Yeah, of course. And APM a framework now, apparently. But, the trouble is is that there’s no reason at all why your nav component you’re getting off the shelf can’t be wrapped in a nav element, and it’s wrapped. But, people don’t do it, and the trouble is is people are just taking these components off the shelf and using them without really thinking about what the structure of the elements is. Once these components are abstracted away people spend even less time wondering about the semantics of them.

Bruce: I had a look recently at some popular libraries, some of them are dreadful and some of them are really good. The 10 on UI library, for example, is a fabulous thing. It’s full of semantics, and I know that the people who wrote it actually tested it with people with assistive technologies, and not just screen readers, and it’s free and open source. Whereas, React Bootstrap is just <div> soup, so I know which one I’d choose. And at some point, when somebody gives me a million pound grant, I’m going to go through and score all the different libraries.

Web Components and semantics

Vadim: Speaking of frameworks and libraries, web components are not library or framework per se, but I think they’re kind of a future of the web, it’s something that’s going to happen with us sooner or later, like we’re going to start using the components instead of just including some scrips, and libraries, and things like that. So, I wonder what’s the situation. Is there any conflict between custom elements, for example, and web components, and HTML semantics? Is there a way to combine them and use them together?

Bruce: I think there is, but it’s trickier than I would like it to be. It used to be the case that you could extend an existing component. So, you could say button is equal my fancy button, and then it would inherit button semantics, and the things a browser gives you free from button. But, the Apple web kit people were pretty vehement in their opposition to that. And, although I was grumpy at the time, I do see their point. The actual number of real world components that will extend or inherit HTML are pretty low. But I’m quite excited by the AOM, the accessibility object model, which will allow us to do a lot with web components, without having to pollute the mark up with 28-trillion area attributes.

Vadim: All right. So, during initialization they would apply not just attribute, but they would initialize the actual accessibility tree with some additional goodness.

Bruce: Exactly. I can send you a link to Léonie Watson, again, doing a talk in Singapore or Hong Kong about it, but it looks really exciting. And it’s actively being developed by, I think, Apple, Google, and Microsoft or Firefox, but it’s based on a Microsoft proposal. There’s some big names involved with the specification. I really like the way… This is massively geeky, but I really like the way they specified it into different phases, acknowledging the world challenges of specifying these kind of things. But yes, it means that you will be able to set what is kind of like area roles, et cetera, inside the component itself, rather than you having to sprout lots of area on top of your component, and knowing how to do that, and remembering to do that, the logic will actually be encapsulated in the component, so that therefore when you do grab it off the shelf you really are grabbing something that works and you don’t have to add extra stuff. We all know that built in beats built on bigly.

Vadim: Oh yeah. And also, it’s not web component’s exclusive thing, it could be applied to any library or framework.

Bruce: Yeah. Yeah. Yeah.

Vadim: So, I think there are some initial implementation, there is some initial implementation browser already.

Bruce: Yeah.

Vadim: So, yeah, maybe behind flags, but still. I hope this is something released probably even this year.

Bruce: I would think so, and I think that’s the next big thing for HTML. I mean, there’s new APIs, et cetera, being added all the time. But, this is foundational. It has the potential to make a lot of stuff better quickly. So, I’m excited by this. The one thing that people are worried about is it would be possible for a malicious website to tell whether somebody’s using an assistive technology, or potentially possible.

Vadim: Well, there are already some tricks to do this by detecting some behavior patterns, and things like that. So, there’s not really a way to hide it, I think, like a reliable way to hide it.

Bruce: I agree. But, luckily the web is full of cleverer people than me, so I’m pretty sure that we can reach a compromise. As always with the development of the web platform it’s about making sensible compromises.

W3C and WHATWG

Vadim: So, we were discussing the beginning… The idea in developers heads, and actual reality of HTMLs specs, and browser implementations. There are two of them still, but this process is… This situation is slowly changing. As far as I understand, W3C spec editors, and WHATWG editors they found some middle ground, and they are willing to merge their specs? Or what’s going on there?

Bruce: Yes. I was… Am, was, not sure of what tense I should use, one of the editors of the HTML5.3 spec. We had a meeting in Amsterdam last June or July, and we were told that the W3C was effectively going to stop developing HTML and it would go to the WHATWG.

Vadim: Meaning that they’re not going to support their own version, but they’re going to still do some work on WHATWG, right?

Bruce: Yes. I mean, the people who were participating in the W3C spec are now participating in the WHATWG spec, and I think that the W3C will continue to maintain a snapshot version. Yeah. This is good and bad. It’s good that we’re going to have a single source of truth for the spec, because it was always confusing that there was a W3C spec and a WHATWG spec, and the two would say different things.

Vadim: Yeah, even conflict things, not just slightly different, but sometimes… One of the main examples I remember is multiple <h1> tags on a page. Yeah.

Bruce: There were a number of differences. For example, the W3C spec only allowed one <main> element per page.

Vadim: Oh, yeah. Yeah, that’s true.

Bruce: Whereas, the WHATWG version allowed multiple versions, because you might be swapping them in and out with JavaScript.

Vadim: And, I think in W3C version, it’s allowed to have multiple <main>, if they’re hidden attribute or display: none, which is basically the same thing.

Bruce: Yeah, and there were things like the age group element wasn’t in the W3C spec, because it’s completely useless. There was a myriad of differences, and that reflected the different purposes of the organization. I mean, the W3C spec showed what was actually implemented. The WHATWG spec has always been forward looking, and some parts of it are complete descriptions of reality, and other parts of it are basically people jamming and riffing ideas, and their implemented nowhere. The trouble is, if you’re a developer and you just want to know what you can use, maybe the WHATWG spec isn’t the best place.

Vadim: That’s true. Yeah. And also, W3C version was full of interesting comments on accessibility and on implementation for developers. Some usage examples, not just for browser vendors, but for actual developers. It was the main reason for me to prefer W3C version.

Bruce: It’s one of the reasons that I agreed to be the editor, or one of the editors for it, along with Shwetank Dixit, another old colleague of ours, because there would be real world examples for devs pointing out accessibility stuff. Mostly driven by Steve Faulkner, hat tip to Steve, the guy does so much for accessibility, et cetera. And, I know that he tried so many times to get that stuff added to the WHATWG specification, and his pull requests were overlooked, et cetera. So, I’m cautiously hopeful that the WHATWG people will start accepting more dev focused examples, et cetera. But, I do think there’s a fundamental difference in what the WHATWG spec was for, and what the W3C spec was for, and I worry that they won’t have… I worry that people will think that stuff that is super, super nascent is actually in browsers, and I worry that people won’t get the advantages of Steve and others accessibility advice. But, they haven’t made me king of the internet yet, so my worry is just a formless worry that I can do nothing about.

Vadim: Is it something that’s happening already? Or it’s just a plan for specs to… Well, not to merge, but to move development into WHATWG?

Bruce: It’s happened already.

Vadim: It’s already happened? So basically, if you’re a developer, and Bruce Lawson’s advice to use WHATWG version now, or it’s not quite there yet?

Bruce: Personally I would use, and I do use, both, because they haven’t diverged anymore than they were already different, and there’s valuable advice in the W3C version of the spec, which I hope will be merged into the WHATWG spec. But yeah, HTML5.3 is never going to be a candidate recommendation. There was talk about publishing it as a W3C note, but that’s just-

Vadim: Oh, so you’re saying that HTML 5.2 would be the last recommendation?

Bruce: I don’t know whether the snapshotted version of the W3C… Of the WHATWG spec will be recommendations in the future. I just don’t know about the process. I’ve always found W3C process to be somewhat opaque.

Vadim: It’s not that it’s super important for developers what is a recommendation and what’s not, because these days we mostly rely on browser implementation, because that’s what users use. That’s the main reason for us to care. But, W3C version was my favorite for many years, so I got used to it so much, so it’s much harder for me to read WHATWG version, because of the design of the navigation. But, I think I’ll have to deal with it.

Bruce: Yeah, I think so.

HTML5 Doctor

Vadim: So, you mentioned already HTML5 Doctor project, it used to be a very valuable source of wisdom for me. We used to translate articles in the web standards community, like we used to have articles like B and I Element, or Figure and Fig Caption Element, translated to Russian from HTML5 Doctor, and many others. I think we have five to six articles translated, or maybe even more. But, they are kind of… I wouldn’t say outdated, but they are forgotten, I would say, because no one’s really… No one care’s what’s the difference between strong and B anymore. Just like I said in my Semantics for Cynics talk, there’s no real use in distinguishing between B and strong elements in your service code, because actual browsers, actual screen readers, they don’t care. They don’t have a way to tell if it’s B or strong. They probably a bit outdated, where is it now? Or what’s the place of HTML Doctors website these days? Is it still valuable?

Bruce: I don’t know. I mean, it started off, like I mentioned earlier, in 2008, maybe 2009, I can’t remember. Instead of doing real work for Opera I was forcing my Word Press templates to use HTML5 and Tweeting about it, and blogging about it. And I think that was quite valuable, because it allowed me to say to the authors, I’m a real developer, and I’m trying to work out the difference between these tags, and there’s not enough in the spec to help me decide. And they would go through and tweak the spec. I think it was a future of web apps, or future of web design, can’t remember which conference it was, and I’d have said on Twitter, anybody want to meet up and have a chat about HTML5? Let’s do so. A group of us who’d been talking on Twitter, but had never actually met in real life all went to the pub, and we decided to make HTML5 Doctor to be a resource that wasn’t in spec language about HTML5.

Bruce: We did it. It was pretty good. A lot of people came to visit it. And, as happens, people got married and had kids, people got jobs. I moved jobs and didn’t have the time to maintain it. And, crucially, better alternative documentation turned up on the web, and it seemed like there was no real reason to continue writing HTML5 Doctor stuff. So, when it went down last week I was on holiday, so I didn’t pay much attention. I sort of said, well is it time to… Is it time to take it offline? I think what we’ll do, if Richard, who actually does the hosting agrees, is we’ll probably turn off all the comments, because there were trillions of spam comments, because it’s a Word Press blog, and just keep it alive, a bit like the webstandardsproject.org is still alive, even though we’re doing nothing more, because it is a part of the history of the web. It’s not a major part, but it is a little bit. Future archeologists might care about it.

Vadim: We still link to HTML5 Doctor articles from HTML Academy courses. So, it would be good if it will be online, and with the same link structure. What I did with some projects from 10 or 15 years ago, they were… Most of them were based on Word Press, so what I did, I took a static copy of everything I have, like Wget, or cURL, or whatever script I used to get all the content from the site. So, instead of running the whole WordPress with database, PHP and everything. I just took a static snapshot of the site, I checked all the links, so everything was intact, I turned off comments, because there was no use for it, because it’s like no one is answering questions and everything. So, it’s much reliable to have website like this, and it’s much more cheaper to host, because you don’t have to enable database, and PHP support, so it could cost you like nothing to host static website. That’s my recommendation for you.

Bruce: Oddly enough, I was looking at that. I was chatting to Phil Hawksworth who is one of the Netlify’s DevRel people, and that’s another project, but there are plug ins now to Word Press where basically it will just vomit out, as you say, a flat HTML file with all the link structures intact. And then, you can just pull that over to Netlify, and they’ll host it for three beans and a cigarette, and there’s no attack surface, because there’s no database running, there’s no code running, it’s just a static version. You can just port the DNS service to the Netlify domain. So, I’m going to suggest to the boys that we do that with HTML5 Doctor.

Vadim: The problem with Netlify though, that it wouldn’t be available in Russia. Seriously. We have serious problems with Netlify. My whole Twitter timeline is full of people screaming and crying over Netlify, like oh it’s the best thing that’s ever happened to the web, and it’s so easier to use it than any other hosting solutions, but it’s just most of the sites hosted on Netlify are blocked in Russia, because of their IPs are blocked for some reason by the Russian government.

Bruce: Probably an accident.

Vadim: Yeah, it’s an accident. Well, it wasn’t an accident, because they were trying to block Telegram Messenger, and it was hosting some parts of it on Netlify, so they blocked the whole range of IP address. I’m just telling you that hosting something on Netlify these days, it’s like using the audience from Russia almost.

Bruce: I had no idea that Netlify was blocked in Russia.

Vadim: So yeah, that might be tricky. A static copy of HTML5 Doctor is a good idea, but hosting it on Netlify, I don’t know. I don’t know.

Bruce: Wow. I suppose I better move my Pussy Riot appreciation site off Netlify then.

Vadim: Yeah. It would be good to have it available in Russia as well. All right. So, I hope you’ll find an easier way to host it just for history’s sake.

Bruce: Yeah. I don’t like taking things off the web, because who knows what will be of interest to future archeologists. So, we’ll get that sorted out in my copious free time.

Accessible PDF with Prince XML

Vadim: Okay. Back to what you’re doing these days. Like, you used to work for Smashing Magazine, but you left, and what’s happening in your life? What’s your current projects? I think I heard something about Prince XML and some others.

Bruce: I’m back as a consultant. So, I’m doing some work for Håkon Wium Lie on Prince XML, part-time, not full-time. For those who don’t know, Prince XML is a piece of software, very old, very mature, 17 years old, that will allow you to make a PDF from HTML CSS, and SVG. We’ve been engaging in exercise in preparation from Prince 13 to come out, to allow developers to have more control over the mapping between HTML semantics and PDF semantics, because maybe not many people know it, but PDF has its own semantics that allow people with assistive technologies to navigate around PDFs.

Vadim: Oh, really? Because, most of the PDFs on the web, well not most of them, but many of them, are just basically JPEGs in PDF containers. So, I guess they’re not marked up properly. But, if you convert something like even Keynote or HTML page into PDF it’s much lighter than images of the same content would be, and it’s accessible at the same time.

Bruce: It’s accessible, if you have reasonably accessible source. Like anything else, if the source is well marked up it can then be translated into PDF accessibility. There are multiple good reasons why inaccessible PDFs live. You know, if you’re just preparing something to be printed it doesn’t need to be accessible, but a lot of governments… A lot of big organizations for various reasons offer their content as PDF, therefore it should be accessible, therefore it should be tagged and Prince, in my opinion, has the best tagged PDF output. On the Mac, Prince runs on the command line, you just do Prince, my lovely page, HTML — , and then you give the PDF profile that you went. You probably want the UA one, the universal accessibility one, and that will produce you an accessible tagged PDF that meets WCAG, which is good, because WCAG also applies to PDFs delivered over the web. It’ll do things like take your HTML lang=RU and apply the RU language to the PDF, not translate it, but mark it as thus so that the end user knows what language the PDF’s in. It adds all the structural PDF tags and it’s really cool.

Vadim: Yeah. I used to use Prince for printing Shower presentations back in… Five years ago, or maybe even more. Actually, Håkon, inventor of CSS, and CEO of Opera, maybe he’s still-

Bruce: No. He’s not with Opera anymore.

Vadim: So, he actually influenced me in a way, in that I started this Shower project, because there used to be this Opera Show, or Opera Presentation, what was the format?

Bruce: Opera Show. Yeah.

Vadim: Opera Show. Yeah. So, I thought, yeah I’ll just make my own presentation using Opera Show. And I did, and then I thought now I think I need this, and that, and then I developed Shower. And since then, since 2009 or even… Yeah, maybe since 2010, I’m still developing this presentation engine. There used to be a need for PDF export of my presentation, like back in 2010, I think it was a common thing to do, for conference organizers to request PDF from speakers. I think they still do those things. But, as conference organizer, I accept HTML presentations as well as a source. I’m not asking to export everything into PDF. I tried different solutions back then, and then I realized that Prince is the best spec compatible, and the most convenient tool to export HTML and CSS into PDF.

Vadim: But then, I struggled with some problems with compatibility, because what you expect when you create a presentation, or basically anything for browser, you expect this PDF engine to have the same level of support, the same number of features and even bugs in their source code, or similar, and you expect the same results. But, Prince used to be behind some technologies, because I remember there was no Flexbox implementation back then, and not sure about the Grids implementation these days in Prince XML. And some others, like I remember not being able to create lines using gradients, because there was no way to have transparent color in background in Prince for a while. So, little details, and these days there’s a script, command line utility called Shower, so you can type shower pdf, and then it will print your presentation into PDF using Puppeteer, basically Chromium with some extra APIs around it. So, that’s what I use myself, and that’s what I recommend for developers to use.

Vadim: The funny thing with printing, it’s not number one priority for browsers, or for anyone really. I think Chromium and Prince… Well, I wouldn’t call Prince a browser, but Chromium and Prince there are two engines for HTML and CSS that support @page for specifying page size. There’s no way to specify page size in Safari, Firefox, Edge, anywhere, only in the Chrome and Prince. So, there’s no way for you to properly print something, unless you use Chrome or Prince. That’s so funny.

Bruce: It is strange, isn’t it? I mean, I know myself, I can’t remember the last time I tweaked my print style sheet on my websites, because I don’t tend to print stuff out.

Vadim: I don’t have a printer at home.

Bruce: Whoa, you’re so 21st century. I do have a printer, but it doesn’t get used a great… It gets used as a scanner more than anything. But, there are still legions of use cases for printing out from the web. Going back to what you said about Prince, Flexbox is there now, CSS Grid is coming in a version, not sure if it’s in version 13, I haven’t looked at the roadmap for a couple of weeks. I used to work for the quasi governmental organization that regulates lawyers in the U.K. and for various reasons every page of the website had a print this as PDF to save it. A lot of more traditional organizations use dated PDFs, I mean dated as in time and date stamped, rather than dated in archaic, as reference. This is what the rules were on this day. And, that thing becomes a legal term of art, which is an important use case. Websites producing invoices, boarding passes, receipts, restaurant menus, and printing books is a significant use case for prints.

Vadim: Yeah. I have Lea Verou’s book CSS Secrets, the source of it is in HTML and CSS. I also have your and Remy’s book HTML5, maybe it was also originated in HTML CSS.

Bruce: We wanted it to, but the publisher went, absolutely not, Microsoft Word.

Vadim: Yeah. Classics.

Bruce: Yeah. Yeah. I had to relearn how to use Microsoft Word for that one. Yeah. I mean, Håkon’s first book about CSS was written in HTML and CSS. And one of the use cases of prints, which I need to write an article about after I write the one about prints on Amazon web services, is the extra… I think it was GCPM, generated content for paged media specs that Howcome wrote for CSS. They’re not in browsers per se, but they are… A lot of that stuff’s in Prince, so it allows you to add the extra things books have, like if for example you are printing a dictionary, probably in the top outside margin top of each page you would have the first and the last word that’s defined on this page, so that people can leaf through it and find it quickly.

Vadim: Oh, really?

Bruce: That sort of thing is specifiable in Prince using, well, the CSS extension, basically, a vendor prefixed.

Vadim: CSS used to be heavily influenced by printing, what they have in printing. These days it’s not anymore, and it probably makes sense.

Bruce: Yeah, but there’s a lot of people who do need to print from the web, and if you’re going to make a PDF, make it accessible kids.

Vadim: Oh yeah, yeah, that’s true. While we’re talking I was going through Prince roadmap, and I see there’s many things going on. I was wondering if it’s written from scratch, or if you’re writing it as an independent engine, taking some source code from some other open source browsers. I have no idea how this works, so maybe you’d copy some code from Chromium project? I don’t know.

Bruce: I asked them, because I was thinking of adding it to Can I Use. It is its own implementation. It’s not Web Kit, it’s not Chromium, it’s not Gecko, it’s an independent implementation of the specs.

Vadim: And this is amazing, because there are not many independent implementations out there. It is good to have one more. And, it’s sad that it’s not listed anymore, it’s not widely known that there is an implementation of modern specs, but not for browsing, but for printing.

Bruce: Yeah. I contacted Alexis Deveria, I think, who runs Can I Use, and asked if we could list it there. But, he said entirely legitimately, if we start adding things that are not traditional browsers there are a heck of a lot of different things which would have equal claims to be listed there, and it would make the site just unwieldy. I acknowledged that browsers is the main use case. But, I’m probably going to add a Can I Use, like easy glance of what specs are supported and what specs aren’t. It’s all documented, but who reads these days?

Vadim: I heard that in science, those serious and smart people, they still use PDF a lot of scientific magazines and for sharing their papers, and things like that. So, one of the main reasons they still do this, because not every way of expressing scientific meanings is supported on the web. For example, MathML, there’s no way to use MathML freely on the web these days. And, I heard that… Who’s working on this? It was Igalia and I think they’re implementing it in Chromium.

Bruce: Yes. A friend of mine, Brian Cardell he’s an Igalia DevRel and he’s working on MathML. I never, ever see mathematics papers, because I can barely count.

Vadim: Same here.

Bruce: But, there’s a huge community of people who need to be able to express their equations, et cetera, over the web, and yes, they’re using PDF, because you simply can’t put those on the web. It’s weird, because there used to be a really good implementation of MathML in Chromium, but then they took it out.

Vadim: One of the co-hosts of this podcast, Maria, she is a mathematician herself, and she gave a talk last year in Kiev, something about MathML, math on the web. She said that math is the language of the universe, and this is one of the big reasons for Chromium and other browsers to support it.

Bruce: That’s very poetic. Math is the language of the universe. She’s obviously a confirmed Pythagorean.

Vadim: Oh yeah. So, I wonder if you’re going to support MathML and expand the use of Prince for scientific community, for example.

Bruce: Well, I can’t answer that. But, I will, when I’m having a meeting with them, I will recommend they support MathML, because that’s one of the things I’m doing is recommending stuff for them to do. But, I can’t make that promise, because I have no power to make it happen, but I will be recommending it.

Web Monetization API

Vadim: Apart from Prince XML, what’s going on in your life?

Bruce: Well, oddly enough, only this morning a chum of mine in South Africa, Adrian Hope Bailey, has submitted to the web incubated community group, and explainer that he and I have been working on for a new browser API called Web Monetization, and I’m really, really hopeful that this takes off, because what we want to do is kickstart basically a new revenue model.

Vadim: I have an idea what web payment API is, like it’s the way to request, basically it’s payment request API, so you request payment. It’s in the name. But, what’s web monetization?

Bruce: Well, web payments are for discreet payments, you know, like $10 for a CD, or 100 rubles for this Pussy Riot album, or whatever. Web monetization is streaming tiny amounts of money, the unit is nano dollars, but streaming it constantly to a website.

Vadim: I have no idea how one could stream money. No, really, it sounds complicated.

Bruce: There is a payment provider, I’m consulting for Coil who are a payment provider. Their business model, every payment provider can make their own business model, Coil’s business model is you pay $5 a month to them, and they will stream money to any monetized website that you visit on your behalf. And, if they pay more than five bucks, that’s fine, they don’t take any more from you, it’s a five bucks all you can eat model. And, if you are a web master, and you want to monetize your website, you will go to your bank, or your digital wallet, or whatever, and ask for a payment pointer, which looks… It’s got a dollar in front of it, so it’s obviously about money, and it resolves to an HTTPS URL, which is your wallet.

Bruce: But, it’s safe to share, and it is human readable. So, you can read it over the phone to somebody. That was part of the design. You add a meta tag, meta name equals monetization, content equals your payment pointer. And then, anybody who comes to your site with, at the moment it’s a Firefox or Chromium extension, the Coil extension, but we’re proposing that it be an open standard, so that it can be a whole ecosystem of different payment providers, and you will get money coming into your wallet. Simple as that.

Vadim: So, basically this Coil company replaces what, like a payment provider? Like PayPal?

Bruce: No. It’s precisely not to replace anything. It’s to make a new ecosystem for people to get paid, because the trouble is, with the web, is number one, we trained people to believe content should be free, and then we polluted the web with advertisement.

Vadim: To support content creation.

Bruce: Exactly. And, you’ll remember Vadim, that we worked for Opera, which was the first browser to have a built in ad blocker. I used to get emails from people hating on us, saying I earned $1000 a month from ads on my websites, and I live in Croatia, or Belarus, and that basically paid for my life, and now you’ve cut off my revenue stream. I always felt guilty, because I used to have ads on my site before I joined opera. So, this is an attempt to allow people to monetize creating content without having to have massive, heavy, intrusive, surveillance, capitalist ads all over their sites.

Vadim: I’m a communist, in a way. I mean, I don’t have any ads on projects I run. I believe in this thing. I don’t advertise anything. But, for web standards project we have sort of, how would you call it, like native ads. Basically, we invite some guests to join our podcast, and tell something interesting, and that’s the way for them to tell their story, and something interesting. It’s the way for listeners to hear some interesting stories. Also, we have Patreon page, and we have some income that we spend on equipment, and some stickers, and some prizes for our Patreon’s, so we’re trying to monetize our project that’s free for everyone with some tools like that. I wonder if projects like web standards community, and like that, if it’s a target audience for this spec and company.

Bruce: Absolutely is, because this would supplement your Patreon, and your sponsored content, if you like. I don’t see that kind of stuff as advertising. I mean, you’re lucky that you have a community big enough where people would want to give you money to come and talk about relevant projects to a relevant audience who won’t be turned off and bored. I’m thinking about not just web developers, but I’m thinking about the huge long tail of the web that probably you and I never see, the women who write parenting blog posts, or people who write about medical issues, I’m thinking about the forum that I used when I first got diagnosed with MS. These are things that don’t attract corporate sponsors, so people are reduced to horrible ad farm.

Bruce: I mean, when I had ads on brucelawson.co.uk before I joined Opera, it was because in 2003, 2004, when I started my website, web hosting was much more expensive than it is now, even though it’s 16 years later. I had two young children, a low salary, and hosting was a non trivial amount of money, and the ads paid for that. It wasn’t like I was sitting in jacuzzis full of champagne with super models on the money, it just meant that I was giving up my time to maintain a web standards blog, but I wasn’t actually out of pockets as well, and you’ve got the same thing, I think, with your projects. It’s nice to be able to do that without ads. Native ads are fine, but the horrible, malignant things that track you all over the internet, and potentially give you malware. I’m glad we’ve blocked those at Opera, and I wish they would go away.

Bruce: So, web monetization is an attempt to provide a different revenue model, and that’s why we’re proposing it as an open standard, because we really want companies to compete against Coil offering different business models, because we want this to be a way of content creators to monetize their sites without ads, et cetera.

Vadim: But, so far, how was the feedback on your proposal? I wonder if there are any other companies, or spec editors, or web standards community members interested.

Bruce: Feedback, I don’t know, because it went live 30 minutes before we started this conversation, and obviously I’m not rude enough that I’d be looking at something else while-

Vadim: It is published as recommendation yet? Just kidding.

Bruce: It’s a proposal. It’s in enough shape that it’s open for discussion, because we obviously will change it with community feedback. I know a couple of the browser vendors are interested. I know there’s a lot of content creators we’ve spoken to who are interested. And I shall be persuading friends of mine to put this meta tag in their head, so they can get monetized up.

Vadim: We’ll be glad to be one of the first users of this meta tag, and mechanism.

Bruce: Groovy. And, any feedback you have on it. We designed it to be as easy as possible. You know, it’s one line that you add to your head, it’s a bit trickier to monetize things on YouTube, and Sound Cloud, et cetera, but then of course not everybody makes written content in HTML. We’re figuring that out. Yeah, so that’s my humble wish at the moment, to make a brand new revenue model for content creators, because those are the people who have been writing all the fabulous stuff that we’ve been consuming and learning from for free for the last 15 years.

Vadim: Oh, wow. Bruce Lawson saying a revenue model.

Bruce: Yeah. We’re going to leverage it, synergize the ARPU or something.

Vadim: Oh, yeah.

Bruce: I have to go and wash my mouth out with bleach.

Vadim: What else is going on in your life and your work?

Bruce: Doing a little bit of consultancy with my good friends in WiX, a couple of hours, sanity checking every week with them, and working on a secret hush-hush project with the flame haired FOSS love god, Steward Stuart Langridge called a Swordcello.

Vadim: Swordcello. Really?

Bruce: Yeah. And he would kill me tomorrow when we go out for our weekly meeting stroke beer if I gave any details about it.

Vadim: Okay. So, we’ll see. We’ll see. But, the name is cool.

Bruce: The name is cool. And, if you go to swordcello.com you’ll hear that I’ve already made some introductory music based upon the classic zombo.com, because I needed to buy the domain name, and it just saddens me to have a domain name with nothing on it. If you’re going to have some content you need a song, don’t you?

Vadim: Oh, yeah. That’s true. That’s typical Bruce Lawson.

Bruce: Yeah. Yeah. I might grow older, but I don’t grow up.

Vadim: You were listening to 187th special episode of Web Standard podcast. Thank you for joining us today, Bruce.

Bruce: Rock and roll and web standards.

Vadim: Thank you for listening. See you next week in the regular Russian speaking podcast. Cheers.

Bruce: Bye.

187. HTML semantics, Web Components, W3C and WHATWG, HTML5 Doctor, Prince XML, Web Monetization

August 7th, 2019

Topics

HTML semantics use cases

Web Components and semantics

W3C and WHATWG

HTML5 Doctor

Accessible PDF with Prince XML

Web Monetization API

Links

HTML semantics use cases

Web Components and semantics

W3C and WHATWG

HTML5 Doctor

Accessible PDF with Prince XML

Web Monetization API

Written by Vadim Makeev