On supporting content types other than HTML in OStatus
Article by Danny Rushyo Moules, 12th May 2017
Federated social networks such as Mastodon and GNUSocial communicate using a protocol known as ‘OStatus’. This is partly documented in an unfinished specification (the OStatus 1.0 Draft 2.0 document), and is otherwise defined by some parties as a non-living anti-standard which is ‘whatever it has been historically’, or the exact opposite ‘whatever it needs to be’, usually further qualified as ‘well, whatever my software, the one true use case, thinks it needs to be’.
The discussion of whether the protocol should be defined an undying, unmoving, undefined carcass of a non-standard is beyond the scope of what I’m analysing here, but bears dismissing in any discussion about a hypothetical real-world functional ‘OStatus specification’.
For this article I’m going to assume an interpretative approach which analyses what specifications are codified, what the intentions of those specifications were, and look to develop potential solutions based on their likely practical impact.
As it stands, statuses sent across the fediverse between OStatus consumers are presently consumed, at least in most cases, as HTML. They include publishing metadata, such as mentions, hashtags, and formatting styles. This can create issues with non-HTML consumers looking to read other data, and I’ve encountered the problem twice for two distinct use-cases:
The #nobot flag — which is a consensus-agreed mechanism I proposed for users of the fediverse to indicate to a ‘bot’ user they don’t want to be interacted with directly, involves the user simply putting #nobot in their profile description (bio). This relies upon the bot being able to parse the #nobot content. In plaintext, this is a trivial exercise, but in cases where bios contain HTML, the hashtag in #nobot often (but not always) is transformed in to something like:
<a href="https://example.com/tags/nobot" class="hashtag">#<span>nobot</span></a>
This mechanism was invented from similar experiments on other social networks, which provide plaintext representations of bios or status content, and so was ill-equipped to deal with this scenario. #nobot consumers often fail to parse this successfully as ‘#nobot’, especially as this publishing format is utterly non-defined, making it impossible to realistically parse future updates to this behaviour by the various software in the fediverse. The natural result? People expecting to be free of bot interactions getting pestered, and a lot of vitrol between users, bot developers, and platform developers as a result. A bunch of hotfixes later, and things are sort of holding together with duct tape and goodwill, but it’s a ticking time-bomb. Not that #nobot itself was ever anything more than a quick bandage for a wider socio-political problem with the network, but that’s off-topic here.
The other case I’ve encountered, far more relevant right now, is in the software family I’m developing called TootCrypt. TootCrypt is a family of ‘end-to-end’ encrypted messaging protocols and tools for use over social networks. TootCrypt consumes information in both the bios of users (where keys or links to manifests might reside), and status/note messages themselves, consumed from a platform’s API (so it might speak to Mastodon, or GNUSocial, or Twitter, and then read/send the statuses from there).
In cases where the messages are HTML (which, in Mastodon, the reference case, they are always are), it becomes necessary for TootCrypt to parse and break down the messages, hoping that their plaintext representation finishes in something machine-readable that hasn’t been corrupted by the addition of presentational markup. It is, of course, impossible to TootCrypt to reliably parse all possible hypothetical cases of this, and there will come a time where it fails to parse the content based on some undefined updates to the social network software, and hurls accordingly, preventing users from using encryption until hotfixes are put in place. This uncertainty will discourage people from using the tool, and is the only major stumbling block in the TootCrypt model of end-to-end tunneling over social media statuses.
In essence, then, OStatus as used in practice isn’t machine-readable, and this causes real-world issues.
So why doesn’t OStatus meet this need?
Let’s look at what the OStatus draft specification thinks of this problem:
The very first two requirements of OStatus are very explicit about what they hope the specification will achieve:
These are the parameters of the problem we wish to solve.
o An update may be represented with plain text in UTF-8 encoding.
o An update may be represented with HTML.
Great! So, clearly OStatus has a solution to our problem, by supporting both plaintext and HTML?
Well, no, it doesn’t. It simply aspires to. For the implementation details it defers to ActivityStreams.
Atom Activity Streams 1.0 use the Atom format for defining messages.
The Atom specification is explicit (albeit a bit confusing) on how this all actually works:
In the atom:content element, the value of the “type” attribute MAY be one of “text”, “html”, or “xhtml”. Failing that, it MUST conform to the syntax of a MIME media type, but MUST NOT be a composite type (see Section 4.2.6 of [MIMEREG])
-RFC 4287 s126.96.36.199
atom:entry elements MUST NOT contain more than one atom:content element.
-RFC 4287 s4.1.2
It’s also clear that for text content:
Such text is intended to be presented to humans in a readable fashion. Thus, Atom Processors MAY collapse white space (including line breaks) and display the text using typographic techniques such as justification and proportional fonts.
-RFC 4287 s188.8.131.52(1)
Pretty damning all around to the idea of having machine-readable statuses alongside human-readable ones in OStatus. We can’t use multiple content elements, we can’t use composite types, and even if we did receive plaintext, we can’t expect it to be machine-parsable.
But that seems a bit short-sighted. Was it always that way? Well, no. Until the ‘last minute’ of the Atom spec (and becoming an IETF standard), it was being produced on the Atom Wiki, and looked VERY different.
HTML is often viewed as a form of content, but in reality it mixes content aspects with presentation aspects and perhaps even a bit of running code. This can pose a problem unless the recipient is very careful to filter out the undesirable bits. Such filtering poses a number of pragmatic implementation issues given the loose syntax rules for HTML and inconsistent implementation.
-An Atom Designer
The discussions at https://www.intertwingly.net/wiki/pie/MultipleContentDiscussion and https://www.intertwingly.net/wiki/pie/ContentDiscussion are quite winding and complexly inter-related, but the synopsis is:
There were a range of discussions on how to support multiple contents, and the draft specification supported multiple
atom:content elements and composite types at various stages.
So what happened in the outcome?
As far as I can tell, each separate mechanic for achieving multiple content types was independently discussed and rejected on the grounds that there were multiple mechanisms for achieving it. Thus, proposals went in to remove each mechanism in turn, until eventually there weren’t any. Each one with the jusification that the other ones existed. For example, in the case of using multipart/* (some composite MIME format) it was argued fine to remove support because multiple content elements would pick up the slack. In addition, the designers assumed that by the time the sort of use-case OStatus presents came along, their specification would be evolved and we’d have moved on beyond Atom 1.0 to something more developed. So did we truly come up short of any multiple content solution in the final Atom 1.0 spec?
Strictly speaking, no.
Since Atom is a syndication format, not strictly a content sharing format, one means to achieve this was kinda left in more by accident of principled design rather than intentional implementation for this use case. The
src attribute of the
However, this comes with its own problems.
atom:content element can point its
src attribute at a source URI document defined with
type multipart/*, which behaves just like a
As one author of Atom put it:
“src=’…’ can specify a URL which returns an entity of type multipart/*, giving the same facility somewhat more readably (and allowing for ContentNegotiation, which would avoid having to do multipart/alternate)”
So what format would be appropriate at the server-side? Well, multipart/related, RFC 2387, is an option:
The Multipart/Related content-type addresses the MIME representation of compound objects. […] the application processing the compound object determines the presentation style for all the contained parts.
Pretty darn cool. And unlike multipart/alternative, defined in RFC 2046, it doesn’t need all the different types to have functionally equivalent content. Of course, since syndicating the content in this way uses HTTP Content Negotiation, we can flexibly handle this, giving the preferred MIME types to the requesting User Agent based on a negotiated agreement.
But aren’t these standards pretty obscure? Not really. This concept is exactly how your email clients handles mixed text/HTML messages, and the standards have been around and in use for over 20 years. This could rather be considered the accepted solution to this solved problem.
Utilising this solution conforms to the Atom spec, the ActivityStreams 1.0 spec, and the OStatus spec. In principle.
Here comes the catch you’ve been anticipating.
In the Fediverse as it stands today this may come with an issue. The Atom spec is pretty clear that:
If the “src” attribute is present, atom:content MUST be empty
Do current OStatus implementations support
src attributes at the exclusion of having embedded content? I'm unsure how well supported this is in practice.
I had a brief poke through the Mastodon source and didn’t come up with any indication it supports incoming
src instead of HTML content, but that doesn't tell us much as I'm wholly unfamiliar with the codebase and could easily just be missing it.
Myriad use cases
Let’s consider now, that there are other incentives to justify this work beyond machine-reading consumers.
One other big advantage of this solution is implementing software would also have the ability to ship human-readable plain-text based on the raw content of what a user input. This would be great for high-security implementations in the ecosystem (such as TootCrypt clients, which really shouldn’t need to be a HTML renderer and parser, with all the surface area that entails!), console-based clients (which already exist and presumably have to munge the HTML in to text too), or simply for interoperability when the presentation-coupled messages used otherwise break because they’re basically undefined. It also provides a fallback if the HTML were ever malformed or otherwise considered undesirable.
Another key reason for adopting this is it provides a backward compatibility path for widely requested improvements to OStatus, such as Markdown support. It provides a mechanism for moving forward with new formats beyond those anticipated >10 years ago when Atom was written where text or html seemed like a good place to draw the line for simplicities sake, when Atom wasn’t expected to fit all the needs of the de-centralised federated social networks of the future.
Yet, all these use-cases that clamour for this one improvement ask to us to risk breaking something.
We’re left with a number of options to meet these numerous use-cases that have appeared, with pros and cons:
- Implement the
srcattribute without any content beneath the
atom:contentelement. This will break any hypothetical implementations that don't support
src, instead relying on the content being in the Atom messages themselves, but it's unknown if any of these implementations actually exist. It's appears conformant with every spec.
- Implement the
srcattribute with HTML content in the
atom:contentelement as a fallback. This will most probably work just fine for blinkered OStatus consumers who assume everything is HTML, and allows permissive consumers to 'pick their poison' of the two options. The downside is this will break strict conformance with the defined IETF spec, and thus cause any strict Atom parsers to hurl entirely as it would look like an invalid message.
- Implement multiple
atom:contentelements. This was one of the options discussed in the creation of Atom, and for a long time was sort of the default solution to this problem in the drafts. It permits at least text and html types, and won't break most OStatus consumers who probably just take the 'first' child element of
atom:content. However, again, the downside is this will break strict conformance with the defined IETF spec, and thus cause any strict Atom parsers to hurl entirely as it would look like an invalid message. It also isn't as flexible as using multipart/*.
- Just use the multipart/related MIME
srcattribute. This would be break blinkered OStatus consumers who assume everything is HTML, break conformance with the spec, and wouldn't work out of the box with anything in the real world. (Tip: Don't do this)
- Some hybrid of the above (there are various permutations, for example using
type="text"in the first
atom:content, then having a second
- Wait for ActivityStreams 2.0, and either ensure this sort of support is baked in from the start, or take advantage of its extensions functionality (section 5) to force the issue in a way that is non-breaking. This will take significant time, effort, and might not be implemented by many OStatus consumers in the long term, let alone the short term. That’s if it ever happens at all, and all whilst providing no solution at the present. In short, this is a very idealistic solution.
Atom is a very frustrating format to use in a modern environment, filled with good intentions and always intended as a living standard. It doesn’t make for a very good static addendum to other standards, like some Adeptus Mechanicus-style holy writ, complete with schemas acting as Standard Template Constructs.
If the fediverse wants to evolve (and with hundreds of thousands of new users I will continue to argue it must be prepared to evolve) then solutions to these issues need to be considered. Hopefully this article articulates some of the nature of the problem, provides some potentially viable solutions, and presents their pros and cons in a way that enables a decent discussion of the implications without presuming I’ve necessarily understood every element of this very complex problem perfectly.
(…and if anybody informs me “oh, you haven’t even read ‘the’ spec, go read it” for the 100th time I’m now fully justified in setting you on fire) ❤ x
-Danny Rushyo Moules