How to Build a Full-Stack Transcription app with Google Cloud, React, and Python

Sahar Mor
Geek Culture
Published in
3 min readJul 8, 2021

--

Photo by the author

Research in speech recognition has made significant progress in the last several years with examples such as Facebook’s wav2letter and recent HuBERT. Interest and funding of NLP research are also at an all-time high with breakthroughs such as OpenAI’s GPT-3 and Microsoft’s ZeRO-Infinity.

Yet building real-time transcription apps is cumbersome at best. Paid APIs like Google Cloud and AWS Transcribe have limited and obscure documentation with regards to the transcription of streamed audio, and the open-source alternatives mostly come down to Mozilla’s DeepSpeech.

In this piece, I’ll share the steps to building your first real-time transcription web app using Python, React, and Google Speech API.

Note: for those who mainly look for the Google Speech streaming Python code — you can find it here.

The realtime-transcription-playground repository

Setting up Google Cloud

  1. If you don’t have one already, create a Google cloud account.
  2. Click on “Select a project” in the top navigation bar. Then click “New Project” and provide it with a name.

--

--

Sahar Mor
Geek Culture

Bringing the latest in AI to the mass through writings and Github repos | aitidbits.substack.com - generative AI weekly roundup in <2 min