Captioning at Dark — Why and How

Ian Smith
Darklang

--

The magic here: captioning your content is easier and cheaper than you probably realize (and if you just want our ffmpeg script, it’s over here).

Why caption?

I interviewed at Dark in August 2018. When I scheduled my first call¹, Ellen emailed me a couple of demo videos to provide more context for the product and company. I was able to watch them — I have a cochlear implant, the videos were fairly short, and the audio was clear and in a dialect of English similar to my own — but being Deaf, I definitely noticed they weren’t captioned, and wished they were!

There’s a world, I think, in which we say that these are internal videos, not public, and the audience is small, so why caption? But that framing makes it harder for deaf and hard of hearing engineers to interview, and it assumes that there will be no deaf/HOH investors, advisors, or friends on our mailing list. Which is a bit of a self-fulfilling prophecy, right? I’m a fan of Stevie Wonder’s phrasing: “we need to make every single thing accessible to every single person with a disability”.

Stevie Wonder with Pentatonix, presenting a Grammy in 2016

There’s also reasons for this that aren’t about accessibility:
- Are you in a coffee shop (or open floor plan office) and don’t have your headphones with you? Captions!
- Want to jump around in the video? Captions!
- Have a pile of old videos, and aren’t sure which one had that awesome clip you want? Captions!
- SEO? Captions!
- Speaker has an accent you’re unfamiliar with? Captions!

So now you’re wondering how you can get this for your videos, right?

Generating captions

The magic here: captioning your content is easier and cheaper than you probably realize.

First, a don’t: don’t rely on YouTube’s autocaptioning, its quality is unreliable.² (Getting better … but still unreliable.)

Instead, pay for it! We use rev.com, which costs $1/minute. I’ve previously used them at other organizations — the Disability Intersectionality Summit and Project Alloy — and been happy with the results. They promise a turnaround time of 24 hours for most things, with shorter videos often coming back in an hour or two.

You will need to review the transcript for QA/editing, especially of jargon and capitalization (I replaced ‘react’ with ‘React’ a bunch, for instance.) You can also provide, at checkout time, a glossary of words and phrases. Ours includes, for instance, ‘Dark’ (because it’s capitalized) and ‘endpoint’ and ‘UUID’.

You get back a standard SRT file. If you want to just view it, VLC will load subs if the subtitle file and the video are in the same directory and have the same filename (modulo extension). This is the stage where you want to do any proofreading and editing in your favorite text editor.

Embedding into a single file

There’s two approaches to take here, hard subs and soft subs — sometimes called open and closed captions, respectively.

Soft subs are textual data with timestamps, and can be turned on and off in the player (or you can provide subtitles for more than one language). The downside to this is that it is somewhat less discoverable — people don’t necessarily expect a random video to have subtitles! (The mkv video format allows you to mark a subtitle stream as 'forced', or opt-out - but unfortunately, Quicktime doesn't support mkv out of the box.) Soft subs are also nice because they are a text format, and so can be searched or extracted later, or used to find a particular point in a video.

Hard subs are created by rendering text as an overlay on the video — you end up with a single video stream, and the subtitles are always present.

As such, to optimize compatibility and discoverability, we decided to render hard subs and embed a softsubs track to preserve the textual data.

We use ffmpeg to merge subtitles into video, and we’ve written a bash script that takes as input a video and a subtitles file and outputs the finished product. You can also pass --soft-subs if you want to skip the hard subs, and it'll burn a [CC] logo into the bottom right corner of the video.

Captioning is everyone’s job

One final note — writing this script and documenting a process and a workflow is something I did at Dark because I know the ropes, and was motivated to get it done. Going forward, though, I’m not The Caption Guy at Dark; it’s just a standard part of our process for producing video content, and thus the responsibility of whoever is creating the video in question. It’s just a thing we do!

Often, deaf and disabled employees end up responsible for accessibility at their companies in ways that go beyond what’s necessary to meet their own needs, or to establish processes and policies. Typecasting this way can be unfortunate — while some of us do work in a11y, many of us have other interests, skill sets, career goals — and often results in second-shift labor. Making these tasks the responsibility of project owners avoids that, and further emphasizes that accessibility isn’t different from any other product requirement.

¹ My phone screens happened over VRS and Google Hangouts — because sometimes phone isn’t the best or most accessible medium.

² The Really Cheap Way

You can use it to do a rough cut on a private video, and then download the SRT file for further editing! But it will need editing, and the labor involved probably costs more than just paying for captions.

If you already have a transcript, and just need timestamps, Youtube can also do that! I have also heard that aeneas does this (and it is FOSS), but have not used it myself.

--

--