A Plea to YouTube: Help Us Help You

Craig Tataryn
Crowdscriber
Published in
5 min readDec 19, 2018

--

Hi, I’m Craig and along with my co-founder Sean we started a product named Crowdscriber that aims to make YouTube a more accessible place for many, and also help YouTube producers reach a broader market with their content. Our approach has been to provide a free and easy-to-use platform for subtitling YouTube content through the method of crowdsourcing.

We’ve had people ask us how our product differs from that of YouTube’s built in “Community Captions” feature, enough so that we’ve written an article explaining it!

The Problem

Ok, so let’s cut to the chase. YouTube, you have an amazing platform. It scales, it has an API, it’s free (mostly), and important to us is that the API allows Crowdscriber to manipulate subtitles for a video (or captions as they call them).

The downside? When Crowdscriber asks its users to grant it access to their YouTube channel for the purposes of uploading captions, the problem lies in the fact that YouTube lumps this privilege into a very broad reaching and scary entitlement. Basically, we have to ask the user for the ability to “see, edit, and permanently delete your YouTube videos, ratings, comments and captions”. We only need the “see” and “captions” part of this entitlement, but instead we have to ask the user for the whole kitchen sink and the keys to their precious content.

Nobody in their right might would agree to that. Nor do we believe they should have to.

Our Work Around

What we’ve had to do in Crowdscriber is split-out our entitlements such that a user chooses whether or not Crowdscriber is allowed to upload subtitles directly to their YouTube videos or if they simply wish to grant us, at the very least, the ability to view the videos available on a user’s channel. If our user only choose the latter, Crowdscriber won’t be able to leverage existing ASRs (automatic speech recognition) for their videos. ASRs serve as a super-duper-time-saver when transcribing a video, its really an invaluable tool.

We’ve had to split our entitlement grants into scary and not-so-scary

This compromise of us having to explain to our users “Hi, Crowdscriber will need to be able to do anything it wants to your YouTube channel, but really all we want to do is change your subtitles” adds a big friction point to our UX, and Crowdscriber’s main goal is to make the process of transcription as friction-less as possible. What a bummer.

The Ask

YouTube, we would really really really like the granularity of your scopes to be reigned in a little. These catch-all entitlements are definitely too overreaching. The problem is, outside of writing a Medium post and hoping beyond hope that someone from the API team will see it, we have no ability to request features. I mean, there is an issue tracker, sure, but it appears to be a hot mess of spam:

I wonder if anyone looks at this thing 🤔

So please, if you are out there and listening, contact me! Aside from this, we love you YouTube, and we want to make you more accessible while helping your content producers attract more viewers along the way.

Call me! 🤙😙

P.S. While I have you here YouTube, there’s a few more things we’d love to leverage via the API:

ASR — Automatic Speech Recognition for All Videos

One thing YouTube got right with their own Community Captions feature was to use the ASR caption timings as a basis for entering people-generated subtitles. One of the hardest things to do for us humans is to undertake the laborious task of timing subtitles. Amara.org even went so far as to split this task off into a separate “mini-game” that you play after you’ve already entered all of your captions; a sort of a “dance dance revolution” style procedure of tapping “up” to start a caption and then tapping “down” to stop it. I for one really sucked at that game, and it translated about as well into timing captions on that platform.

We too leverage the use of ASRs, but only in the case where a content producer is setting up their own video for transcription. We are not allowed to leverage other people’s ASRs as would be the case when a fan of some content adopts a video into Crowdscriber for the purpose of subtitling it. They are forced to start their captions from-scratch, including getting all the timings right 😢

Audio Waveform Information

Look ma! A waveform!

The Community Captions feature also displays to the user a waveform representation of the person who is speaking in the video. We would kill to be able to leverage this information in Crowdscriber. You might ask yourself “Why? Seems like more of a gimmick than anything, right?” And this is true, as it is superfluous to the act of transcription, however to us it would open up a whole realm of possibilities.

For one, it would grant us the ability to be able to “auto-sync” a transcription into discrete subtitles. YouTube and a few other transcription companies in the space already take advantage of the waveform data for that express purpose. YouTube can do it because they own the data, the other companies can do it because they don’t mind skirting the YouTube ToS by downloading and analyzing the video themselves.

Another possible use for the waveform data is that we could send it along to a speech-to-text service in order to generate our own ASRs. That would be a big benefit our users who are primarily adopting and transcribing videos from scratch.

The end.

--

--

Craig Tataryn
Crowdscriber

Founder - http://crowdscriber.com,, Java, Scala, Ember, Elixir & ObjC dev. All around nice guy.