WebSub for AMP
Accelerated Mobile Pages come with a built-in mechanism to discover them from HTML documents. This is very convenient for companies like search engines and large “reader” like applications which are already crawling and analyzing millions of HTML URLs, but it’s not practical for organizations which do not have access to these massive sets of URLs.
Here’s a mechanism that would allow real time discovery of AMP documents. This is achieved via WebSub if publishers (or AMP caches!) are willing to adopt the following strategy.
- Create an and endpoint which always shows the latest AMP document published (or updated) by the publisher (for example
/latest
). This AMP document should of course include a<link>
element with thecanonical
rel value to point to the HTML “version” of this AMP document (this is the default AMP/HTML discovery mechanism). Additionally, and you’ll understand more below, this AMP document needs to include another<link>
element with the valuerel="amphtml"
which points to the actual AMP representation of the document. - Enable WebSub on that endpoint so that subscribers can subscribe to it. This is achieved by making sure that the previously created endpoint (
/latest
) has aLink
header with therel=hub
which points to the hub of choice by the publisher and anotherLink
header withrel=self
which points to/latest
. The AMP document could also use more<link>
elements in the<head>
part of the markup. - When a new AMP document has been added (or has been updated), its publisher should update
/latest
and ping the hub. (non spec’ed) - The hub will then notify the subscriber that
/latest
has changed and the payload will include the AMP representation of that latest document. The subscriber can then check the HTML document usingrel="canonical"
or get the URL of the actual AMP document using therel="amphtml"
link without the need to poll and parse the HTML document.
WebSub comes with a signing mechanism to ensure the origin of the notification, based on a secret shared by the subscriber upon subscription.
Sub-optimal
- Discovering the
/latest
endpoint itself is done by humans. Options to fix: use of/well-known
URL, adding arel="self"
pointing to it to AMP documents but that would be very confusing.