This post is going to challenge everything you know about the Web.
It all starts like this:
- Create a ".html" file and open in a browser.
- Later, grab a framework and return some data from a server in a custom JSON structure.
That’s the beginning of many people's career as a Web Developer.
Let's talk about one of them.
More precisely, let's talk about how the
text/html Content-Type can provide better foundations to build Application Programming Interfaces. The use case for this is when you want to expose the website's workflow for certain kind of non-visual clients when you already have a semantic website.
You can apply some of these fundamentals in a custom XML API. However, this post is not about XML. With XML, you create a new representation of your website for a different purpose in another URL that is different from the one you serve the UI.
You can also apply most of these fundamentals in a custom JSON API. However, besides the same point as with XML above, for a JSON API to be useful, you need to choose a hypermedia message format specification like Jasonette, Mason, Collection+JSON, HAL or JSON API, which has a cost.
There's no problem with XML or JSON. However, if you intend to expose the website's workflow, there are alternatives. This post is only for the use cases where you already have a website that provides visual data for a human to consume it. In that case, you also want to expose the very same workflow for machines.
HTML-like content types can provide better foundations to build APIs for the Web
Let's start at the beginning.
That understanding is critical.
What many developers don’t get is the HTML the server produces is merely a message, as in a piece of information that it transfers through the network. That information happens to be understood by the code which runs inside the browser. The HTML specification defines how the browser should render that message visually in a backward compatible manner.
However, HTML is not only for rendering. You can add metadata to the markup so that other kinds of non-visual clients can interpret it.
There’s a server somewhere, even if you don’t see it.
A long time ago, there was this Search Engine called Google which had a compelling idea to rank the relevance of websites. They did so by writing code that would read the HTML and look for hyperlinks (
<a>). Once they indexed how many websites were linking to each other, they could quickly check which ones were more relevant than the others. This way, when the user looked up for a search term, the most relevant websites would stay on top of the search results.
Google added a lot of additional heuristics since the first time they crawled the web. However, the fundamentals of crawling hyperlinks are still there. Somewhere.
You can effortlessly write software that can understand what the author of a website meant. Given how the HTML specification defines HTML and hyperlinks, the author of the crawling script and the author of the page don’t need to know each other; they only need to build code that can write and read the message from each computer in a standard format.
Given how HTML defines anchor tags, you can write code to crawl any website and follow links, as long as the author of the website can produce a meaningful message that follows the HTML specification of how to create anchors.
A Web Crawler is not the only type of client that doesn’t care about how the browser renders the HTML. The code for a headless test automation script, such as Selenium or Chrome Headless, only cares about how to identify specific elements of the page. The automation code doesn't care if the underlying test engine has to simulate full browser rendering or not.
For example, if you want to test how an authentication system works, you write a script that can identify the input fields and the form to submit. If you’re testing a website you control, then you can write attributes in the markup to identify those elements. This way, those attributes become the contract between client and server to identify the elements in the page for a test automation script. As long as they remain in the right places after you refactor the HTML, the code that depends on them never breaks.
If the server adds the authentication in other places, say the navigation menu, the code you write for the automation script doesn’t need to change.
You can enhance the website’s HTML to provide attributes that a test automation script can consume. That test automation script understands the website’s workflow.
Given all this, you may want to create specific attributes aimed towards test automation scripts. For example, a “test” attribute with the value “open product details,” or a “QA” attribute with the value “checkout.”
Keep in mind there are other clients besides a test automation script that might be interested in the workflow of the website. If you identify the elements using attributes relevant to what the website affords people to do, then other types of clients can consume it, not only test automation scripts.
Instead of a “test” attribute with the value “open product details,” prefer a "class" with the value “open product details.” Instead of a "QA" attribute in an anchor tag with the value "author," prefer a "rel" attribute with the value "author." This last affordance, the link rel attribute, is a standard for more than 20 years!
It’s impossible to change the name or value of an attribute once it starts being consumed by the code you don't control unless you create a new element. Martin Fowler calls this a PublishedInterface. The construction of test-specific attributes creates coupling between the test automation script and the attributes of the website. If you name the attributes with a terminology that is specific to test automation clients, that name will make no sense for a client that is not a test automation script.
Do not create HTML attributes that are test-specific
Here’s an example of another type of client besides a test automation script:
Say you work in a Social Network website with a bad UI. The company hires a new designer to redesign the website. The designer wants to know the triggers where users click to update their privacy settings.
In a real-life scenario, the most efficient solution is to get up from the chair and talk to the developers. However, everybody is working remotely in different time zones. Communication has to be asynchronous.
The developers decide to code the website in a different way to solve this problem. Some buttons that can trigger the user’s privacy settings are marked up with a class containing the prefix “company name” and “privacy.”
In the page which has the news feed of this Social Network, there’s an “always post as anonymous” shortcut button with the class “company privacy always anonymous.” In the “settings” page, under the “privacy” group, there’s a button with the same semantic. In both places, the functionality is the same. Therefore, it has the same class.
The developers write a script that the designer can add to their browser bookmark. Every time the designer logs into the website in a test environment, they can click on the script, and it highlights all the privacy controls on the screen.
The HTML attributes are not specific to a given client. Therefore, you can also read them from a test automation script. You don’t need to make changes to the website unless it’s to add a new identifier for another element on the page. All the scripts that deal with the existing elements always work, they survive the test of time without significant maintainability impact when you add a new element to the page.
Besides, you can release those attributes as a Public API. You can deploy everything to production so that third-party clients can effortlessly write the same kind of scripts to read the website’s workflow as a non-visual machine, not as a visual user.
You can enhance the website’s HTML to provide attributes that any script can use, including third-party consumers.
These principles are not just for HTML. You can do the same thing with raw JSON.
- You return a bunch of “link” properties that can point to other websites or pages.
- You design endpoints which can return custom JSON to represent your website’s data and write a test script for it.
However, just because you can, that doesn’t mean you should.
Imagine you’re writing code against a raw JSON structure that is specific to every page. There’s a high degree of coupling between the code you write and the message written in JSON. Ideally, you would write code to traverse the message structure and have a standard, or a language, of how to identify the attributes. That has a whole development overhead of its own.
If the server returns HTML and the client code uses a parser such as “query selector” to look for specific attributes to identify the elements, the server can change the whole structure of the HTML without breaking the clients. You also don't need the overhead to write any code to traverse the message; there's a standard language right there: HyperText Markup Language.
By default, the structure of the HTML is not specific to the website. You can add, rename or remove tags and the browser will render the page differently, it won't break. The “query selector” function understands the HTML specification. As long as the attributes are in the right elements, the clients won’t break either. If there’s a requirement that may drive you to rename an attribute, that should be considered a new element for backward compatibility.
The code which retrieves the latitude from the message above reads a raw JSON structure specific to the website. Therefore, it breaks when the server changes the structure. In other words, the domain-specific concepts are coupled with the message interchange format. That doesn’t happen if you use HTML and "query selector," because the domain is not coupled with the message format, as you can see in the examples below:
The HTML examples store the value of the latitude in an array. Say you have many latitudes for Sydney at different places, you can add a new "sydney latitude" element and the code still works:
The “query selector all” function understands the Content-Type
text/html and therefore can parse it. If the server changes the structure of the message, the code still works. With a hypermedia specification, you can use the code somebody else wrote, like "query selector" and be sure it never breaks.
If the server returns raw JSON as a response, clients have to write code that is tightly coupled to the structure. Any significant change to the structure break the clients.
If you want to use JSON and create a robust integration between two computers using HTTP for many different purposes, you need to use a standard hypermedia specification that can tell you where servers should put identifiers and where clients should look for them. You need to choose a hypermedia specification like Jasonette, Mason, Collection+JSON, HAL or JSON API.
You need a language.
That way, you can have an API as robust as HTML itself.
However, if the server uses no specification for their responses, you have to write specific code to interpret the messages only for that website. That creates coupling and a considerable cost of development.
The most common workaround for a fragile API without a language — where every small change can break clients — is to start sending versions in URLs or headers instead of versioning the message or the clients. Given there’s no specification and every change can break stuff, you need to add versions and increase the maintenance cost of the website.
Roy Fielding has had many rants about it in the past:
If you look at how the browser codes against HTML, you’ll see it has already solved the problem of API versioning. You can change the structure of the website, and the page won’t break, there's no need for website versioning. As long as the HTML syntax matches the DOCTYPE and the browser supports that element or attribute, you can render something correctly without breaking anything.
HTML is an API to the browser. What HTML has is a specification that can give you for free a lot of the benefits an API built with JSON without a hypermedia specification doesn’t have.
HTML is not just some magical thing to render a website. The server writes on that language to create an API the browser can consume and provide value to visual clients — the humans. It can also be enriched by the server to provide metadata for non-visual clients — the machines.
server -> HTML -> browser -> human brain
server -> HTML metadata -> Some HTML parser -> machine
You can write HTML in ways that can allow other kinds of clients to understand what the server is trying to say:
- Follow links to other pages or websites.
- Write test automation scripts that don’t need to change if the elements of the page change.
- Write bookmark scripts that can highlight certain elements.
- Expose a Public API for third-party consumers to understand the website's workflow.
Given there’s a significant cost to adopt the right hypermedia specification for JSON, if you already have a website that returns a
text/html API for visual clients, prefer as a sensible default to enhance the website’s HTML and leverage a battle-tested specification that is already there.
HTML is not only for rendering. It’s not only for the browser.
HTML is a great message format that can be enriched to serve as a foundation to build great APIs and expose information about the website’s workflow without significant effort and high Return On Investment. In this specific circumstance, you may not need JSON at all.
It’s time to challenge your assumptions.
Stop trying to fit a square wheel into a standard car.
Also, stop to reinvent the wheel, unless you can clearly show how the new wheel can be a better fit.
I said this post would challenge everything you know about the Web.
Do you feel challenged enough?