Open Captioning

Photo by david laws on Unsplash

Definition: A caption that cannot be turned off, disabled, or hidden — usually implemented by flattening, embedding or encoding into the media source. There is also a variation of this for live theater, described as passive assistance, where the projection of text is visible for all in the context of the performance. Let’s focus on the former.

This is not accessible or inclusive. Like closed captioning, it intended to make dialogue and audio more accessible to a deaf or hearing impaired audience. In the digital world, turning audio into text and displaying it as captioning and/or transcripts is absolutely imperative (not to mention, required by law in most contexts), as is providing alternative text for images. Turning images into text is a progressive enhancement. Turning that text back into images is an antithesis.

A simple [UX] lesson still necessary to shout from the rooftops is to never flatten text into an image. Never. Of course there are hundreds of reasons not to do this. The illusion of control is the only lingering reason that I continue to hear from the opposing view. This particular illusion is based on the intent to display said text in a specific font, size, weight, and position over the image. This is a legacy of print, where those in the visual design space for digital products assume there is a direct translation between or automatic transference of skill between these extremely different mediums. Let go. Embrace the completely dynamic nature of the digital space and the enormity in range of how things render. Most importantly, realize this excludes people.

This exclusion isn’t only the result of poor visual design decisions. It is happening in the publishing platform space as products. Oath just introduced “…a new video product called “Slick” to create vertical video at scale. Slick allows editors to “videofy” text stories into a vertical-driven mobile experience that can be used for mobile web article pages, Facebook, Instagram, Snapchat and within Yahoo’s native apps.” Text into video. For experience. Of whom? And what about the added data cost?

I would like to propose a new use of the phrase — to reclaim it. In the world of open (open source, design open, open licensure, open gardens), the “open” generally means: transparent; participatory or collaborative; free of constraints; free to use; shared knowledge and shared responsibility. Open captions currently means closed. It is an archaic format that literally embeds the thing that needs to be open — text — into a thing that also needs to be opened more — an image or video. To be open, text must be free. The web may no longer be 95% typography, but text is still a user interface as well as the universal format for all assistive technology and bots, and enabler of translation.

My hope is that multiple [new] meanings emerge.

Meaning number one. As a community of web professionals, we should ensure inclusion — of all people of all abilities in all contexts, but also of all technology and future use cases. Open is the goal. It is the origin and purpose of the web. Open captioning could then be used to describe the intentional application of any method of creating and providing a text description of any content or media that is not already text. Images have an alternative text feature. But it is horribly misunderstood, underutilized and lacks standards. Figures have a caption element and video has a track element. How often do you use or see others use them? Some video has closed captioning and transcription service layers. Narrow artificial intelligence, machine learning and neural networks now offer several image recognition services that can add and read a description of an image and even provide captioning to live video. The methods of generating text can and should continue to be many and varied. The methods of providing and displaying it should align to accepted semantic web and accessibility standards — understanding that those evolve as well. However, we need a method of open. Perhaps an open source model where any contributor — a human or a machine — can provide a missing caption or suggest an edit to an existing one. Sure, for the integrity of brands and publishers, and safety and security of the audience, these suggestions would have a model for moderation and approval with many considerations, but so does every other open source model. The American Museum of Natural History (NY) launched an initiative called Project Describe that solicits public participation in both providing and reviewing alternative text for over 30,000 images on the Museum’s website. The openness of the web can be used to continuously improve the access to and understanding of non-text content.

The captions are open.

Meaning number two. The first time I saw a presentation from Lainey Feingold, she was describing each image she had selected in her slide deck as she went along. I thought it was a quirky style, or perhaps she was particularly proud of them. Later, it hit me. She was speaking on accessibility and human rights, as she often does — while exercising them. She was openly captioning. The images in this type of presentation are often supplementary. They add a little more context or a visual example of the point being made or simply a metaphor to it. The intended meaning is lost on those that cannot see it. Sometimes the image or video clip is the only content being presented — assuming a universal understanding of the visual. This then assumes of course it is clearly seen. Aside from the obvious gap for a blind and visually impaired audience, think of the hundreds of other contexts and scenarios where that content cannot be clearly seen — and so, never understood: the view from the back of a large conference room (think SXSW); the view from behind a tall person or other obstruction; low light; excessive light or glare; poor viewing angle; an eyestrain headache; distracted attention to another screen or source; a webinar over an unstable or slow connection; the list is unending. The point is, we should stop assuming that a picture is worth a thousand words. The currency in that equation is words. Use them. If you are presenting in any public setting for any purpose on any topic, and the carefully constructed visual media you selected to accompany your message includes images that are relevant to that message, then describe them. Openly caption them. Perhaps even by explaining why you selected the image or media. Open up the meaning.

The open is captioned.

Meaning number three. Articulating, defending and advocating design decisions seems to be a challenge for some and often omitted by most. Typical documents and prototypes and other various artifacts of a UX process tend to include ample annotation that describes an element or component of a design for: its display behavior; visual properties; content or data source; layout changes across views and contexts; states; etc. Missing from the majority of these visual, technical and occasionally tactical notes are the reasons behind them. Why were they selected? Is there a strategic business advantage? Has it been usability tested against several other methods, options and variations? Is it constrained by some other unseen aspect of the design system? The annotation states what, where, how, and when. Why not why? Adding this opens the captions. It further reduces ambiguity. It adds trust. It aligns to some higher purpose. The captioning of the design process and criteria along with the product may not be popular or easy or tidy or valued by all the stakeholders, but I believe adds value. It certainly adds transparency. It is open.

The captioning opens.

I am not really suggesting that the phrase “open captions” takes on all of these meanings, but all three should happen more. Additionally, the original meaning should be revisited and updated to a truly inclusive method. Text must be open. It would be lovely to also open minds, connections and opportunities.