Your Online Event Should Have Live Captions

8 min readJul 20, 2020

What if I told you there was a way to make your online conference or meetup more accessible to everyone, in a way that also improves the value of your video content in the long term, and is totally within your conference budget if you’re willing to fight for it?

An (in-person) game design conference I run provided live captions — by an actual human, typing incredibly quickly on a special stenotype keyboard in the same room as us — for the first time last fall. In our post-conference attendee survey, it was by far the most-commented-on change in our conference; everyone was universally grateful for their presence.

I now consider it a precondition of helping to organize any sort of meetup, event, or conference that we provide live captions.

As a result, I’ve now had a lot of conversations with other conference organizers who don’t understand why this is important, or are scared it’s going to be too complicated or expensive, or just generally don’t know where to get started.

My goal with this article is to answer all of the common questions I’m used to getting! Hopefully you’ll leave this realizing not just what live captions are and why your next event should have them, but also how to provide them without much effort.

Everything I’m saying also applies to in-person events, but I’m focusing specifically on the technical requirements of online events since, well, that’s what we’re all dealing with for the foreseeable future. I also think it’s generally easier to provide captions for online events; you have even less of an excuse not to!

An important note: I am not personally deaf or hard of hearing, nor do I consider myself part of the Deaf community. All of the advice I’m giving here is based on a combination of my personal experience as an event organizer (and having had to explain this stuff many times to other event organizers) and various conversations on Twitter with Deaf people. If you have additional questions, or have concerns about your specific event, I highly recommend finding a paid consultant to help you.

Why do I need live captions? Most of my attendees can hear just fine!

First of all: providing accommodations for those who are deaf or hard-of-hearing is itself reason enough to provide captions. Accessibility matters. If you don’t currently have many Deaf people in your community, have you considered that a likely factor is you’re not providing an accessible space for them?

That being said, captions benefit everyone, not just those who strictly need them in order to understand your speakers. Non-native speakers, people with neuroatypicalities such as ADHD that can lead to limited attention spans, even people who just get lost for a second and want to be able to catch up on the last sentence or two that were said: all benefit from live captions. Whenever I provide an event with captions, they are overwhelmingly popular with people who have full hearing capacity.

This is known as the Curb-Cut Effect: accessibility tools such as sidewalk curb cut-outs or live captions may be primarily intended to benefit a specific vulnerable group, but frequently end up being beneficial to all.

Additionally, if you plan on posting video or audio of your event online after the fact, having a transcript can greatly increase its value to the community. Being able to search through the content of a technical talk, or be able to read something at your own pace rather than have to watch it, can be incredibly valuable for everyone. Live captions are not the same thing as providing transcripts for uploaded recorded videos, but we’ll talk about that a bit later.

My videochat/presentation/streaming software offers built-in automatic captioning for free! Can I just use that?

The current state of the art of automatic captioning is… not good. People in Deaf and HoH communities often refer to these as “craptions”, and with good reason.

If you don’t have any other choice, it’s better to have robo-captions than not. But please know that, as of 2020, they are in no way a viable replacement for live human captions.

If you must have robo-captions: when streaming from OBS, there is a captioning plugin that can properly send closed captions to Twitch. Otherwise, Web Captioner is another popular option.

But again, know that by providing only automatic captions, you are not providing a truly accessible experience. Add them rather than having no captions if you have no other options, but please consider this a last resort.

What is the difference between “closed captions” and “open captions”?

Closed captions are captions that the viewer/listener can toggle on or off themselves, on their own device. Your TV’s closed captioning is a great example of this, as are the automatically-generated captions on YouTube.

Open captions are always visible for everyone. In context of online video, this means that the video content itself will contain text captions as part of the video stream.

For online video, closed captions are preferable since it offers more choice. That said, some online video platforms can make working with proper closed captions trickier, and you shouldn’t feel bad if you need to fall back on open captions.

If you can’t provide closed captions through the streaming platform you are using, but you are receiving captions from your captioner in a publicly accessible plaintext form (e.g. a StreamText website), I recommend directly giving your viewers that URL. Even if you still provide open captions in the video itself, providing the option to view a plaintext version of the captions can provide additional visual accessibility options for those who need it.

What about sign language?

Live captions and live sign language interpretation serve similar purposes. Many Deaf people prefer sign language to captions. For many, English is quite literally a second language, making reading English text much more difficult than reading sign language. Seeing somebody signing can also much more effectively convey tone and nuance than written English captions.

However, providing live sign language interpretation has downsides. Languages like American Sign Language and British Sign Language are far more different from each other than American English vs British English, and just providing ASL interpretation limits the audience compared to providing English captions. Interpreted sign language takes up more space on a screen than captions, and you also lose the benefits to non-Deaf people that captions provide.

In an ideal world, a fully-accessible stream would provide both. I’ve attended large tech conferences that offered both live captions and live ASL interpretation as options. But if you can only provide one, my current understanding — as somebody who is neither deaf / hard-of-hearing nor speaks any sign language — is that, depending on your specific audience, focusing on captions is likely the best way to provide the most accessibility for the most people.

How do I work with a live captioner?

In general, working with a captioner is straight forward. The setup is slightly more complicated for an in-person event compared to an online event, but for an online event you largely just need to make sure that you have a low-latency audio connection to your captioner (i.e. a Skype call or similar, instead of a public Twitch stream with 10–15 seconds of latency) and they’ll take care of the rest.

There may be some technical plumbing required depending on where you are streaming. As an example, Twitch’s closed-captioning API doesn’t currently easily support embedding closed captions coming from many common web-based captioning tools. I typically receive a public website from my captioner, which I embed in my stream using an OBS web view.

Before your event, you’ll also ideally want to assemble a list of specific technical jargon your speakers might use. This is optional, but will generally help your captioner provide more accurate captions.

Aren’t captions expensive?

In my experience, providing live captions for community events and conferences is very affordable! There are a lot of variables here that will affect cost, but in general, providing live captions for a 2–3 hour meetup typically costs me a few hundred dollars, while a 2-day single-track conference usually runs me closer to a few thousand dollars. Not cheap by any means, but well within the budget of many events.

Crucially, finding a dedicated captioning sponsor can be relatively easy compared to other conference expenses. If you run a tech event, tech companies are generally looking for ways to signal that they care about diversity and inclusion. Having their name attached to providing live captions is a very effective way to accomplish that, and you can take advantage of that while planning out your budget and fund-raising strategy.

How do I even find a captioner?

It depends a lot on what your event is! I personally work frequently with White Coat Captioning, who provide captions to a lot of tech conferences and science-focused academic events. I can highly recommend them. If they’re not a good fit for your event or your community, ask the people in your community for recommendations!

I can upload my live captions after-the-fact as video transcripts, right?

After your event, your captioner will likely provide you with a raw transcript. If you just upload that straight to, say, YouTube, that will already be much higher-quality than the robo-captions you’ll get from YouTube or another service. But they’ll still be pretty rough, in ways that are forgivable when you’re watching a live talk in real-time but can be frustrating if you’re watching a video after the fact. Assuming you’re interested in providing higher-quality transcripts, you’ll need to clean up those transcripts before uploading them to YouTube or another video hosting service.

Manually cleaning up the transcripts is a lot of work. I’ve personally seen many meetup volunteers excitedly sign up to clean up transcripts, only to burn out halfway through. This is real, time-consuming labor. As you’re budgeting for captions at your event, you may want to consider including funds to pay someone to clean up the transcripts, or flat-out commission a new transcription based on your recordings.

No matter what platform you’re uploading to, you conceptually need to provide not just a transcript but timestamp data that matches up sentences to when they should show up on-screen. While it’s possible to generate a timestamped subtitle file yourself by hand, the easiest way to do this is to upload your video to YouTube, paste your transcript into the text box, and let Google generate the timestamped .srt file for you. Even if you’re planning to host your video elsewhere, this is a case where (unlike for the raw captions themselves) machine learning can make your life easier.

…And that’s it!

At the end of the day, making your event more accessible by providing captions is relatively easy. 90% of the work is recognizing this is worth providing and finding the budget for it.

Hopefully this (longer than I intended!) article has both helped convince you that live human captions are valuable, and helped demystify the process of hiring and working with a live captioner!




Experimental games, interactive art, OSS tools. Games & spatial computing advocate at Microsoft Azure.