Never Off the Record

Everyday Transcription — what if voice stops being ephemeral?

David Kerrigan
The Startup
5 min readAug 30, 2019

--

Most recent emphasis on privacy in the Internet age has revolved around face recognition and protection of personal data. But revolutionary improvements in speech recognition mean we need to reconsider how we feel about this technology too.

While it’s been possible to record voice for decades, it’s only recently that it has become possible to transcribe it into text in real time at scale. Think about that — it’s now feasible to instantly convert all spoken word, live or recorded, into a permanent, searchable index of what has been said. Add voice recognition that can identify the speaker and things start to get even more complicated.

Google’s Live Transcribe app is a free download for (recent) Android phones and built-in on Pixel 3

Always Listening

There are microphones everywhere. Ubiquitous smartphones, omnipresent laptops and increasingly popular smart speakers mean you’re rarely more than a few feet from a microphone. In the last few weeks, I’ve had two personal experiences of how powerful, how invisible and how promising this kind of technology has become:

The other day as I walked along, a car pulled up beside me playing music loudly. The next time I took my phone out of my pocket, the name of the song was displayed on the screen. Even in my pocket, it had heard and identified the song. The second, more profound example, was as I recently gave a day-long presentation about Artificial Intelligence. At the end of the day, I could give the students not only a handout of my slides, but also a full transcript of the day for them to review. Every word I said was committed to a document not by a stenographer, but by Google’s Live Transcribe transcription app, for free. It captured everything I said, converted it to text in real time, for display on a projector screen and for later review (in a document) by all students.

I see this as a fantastic advance — a truly beneficial use of technology. As one example, the hard of hearing can get an instant text copy of any speaker’s words without the cost and practicality of a signing interpreter. The World Health Organization (WHO) estimates that there are 466 million people globally who are deaf or hard of hearing. Any technology that can unobtrusively and affordably help include them in more conversations is fantastic.

Everyday Transcription

There are lots of positive uses for transcription in everyday life — genuinely useful scenarios where note-taking could be eliminated, replaced with an exact transcript, annotated if desired, of what took place. I know plenty of students who would love to capture what their lecturers say without having to try frantically to take notes (the very act of note taking aids memory but inhibits attention) or spend hours afterwards transcribing recordings. Google is providing automated transcripts of some podcasts to make their content searchable — surely a first step to all the audio content in the world being as easily searchable as web sites are today.

Already you can find numerous startups exploiting these advances to create products to automate business meetings. Otter and Voicea are examples of virtual assistants which join meetings and essentially replace the old concept of a person taking minutes and recording actions/decisions. Although it’s not yet the norm in most meetings in my experience, concerns about workplace bullying or harassment are as likely to drive companies to consider mandatory recording of all meetings as much as the desire for efficiencies in automation of capturing agreed actions.

“Your Call May Be Recorded”

Peoples’ behaviours tend to change when they are being recorded. Most are frequently far more cautious about what will appear in writing compared to just the spoken word — recall the old cliche that a verbal contract isn’t worth the paper it’s written on!

In what situations will we be willing to cede perceived privacy for the certainty of the written word? While wiretapping has been a thing since the 1890s, most people expect conversations to be ephemeral. In many jurisdictions, recording a conversation is legitimate once you inform the person before you start. But what if every conversation was “on the record”? GDPR applies to voice recordings — so organisations recording voice need to get consent and have good cause for recording, as well as clear policies to manage recordings.

Artificial Listening

Real Time Transcription is made possible by advances in text to speech technology, powered by deep learning, a branch of technology loosely described as artificial intelligence. The cloud service behind Google’s Live Transcribe app can recognise automatically which of 120 languages is being spoken and transcribe it with punctuation and proper noun recognition. It can also identify different speakers and ascribe the text to each speaker. It can even add annotations for sound events — such as if it hears a dog barking.

You can try Google’s service for free here: https://cloud.google.com/speech-to-text/

In situations where actions or decisions depend on the output of the transcription, the question of accuracy arises but with accuracy levels already as good as a human transcriber, and always improving, artificial listening services have already passed the quality threshold.

The written word, of course, is different from the spoken. What about disfluencies — should we accurately record exactly what’s uttered or clean things up to remove the “ums” and “ahs”? What about the emotion that’s lost on the page — reading a speech is a very different experience than listening to it being delivered emotionally and eloquently. Clearly, much is lost in the transition from spoken word to text, but much is also gained in terms of search-ability, share-ability and accessibility.

In making transcriptions universally available, these new technologies may elevate the routine use of the written word where it wasn’t practical before. At the very least they will more commonly capture the spoken word, committing what we say to writing — and its associated scrutiny — forever.

--

--

David Kerrigan
The Startup

Thoughts about technology and society. Author of five books: details at https://david-kerrigan.com