Sitemap
Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

How to Build a Full-Stack Transcription app with Google Cloud, React, and Python

3 min readJul 8, 2021

--

Press enter or click to view image in full size
Photo by the author

Research in speech recognition has made significant progress in the last several years with examples such as Facebook’s wav2letter and recent HuBERT. Interest and funding of NLP research are also at an all-time high with breakthroughs such as OpenAI’s GPT-3 and Microsoft’s ZeRO-Infinity.

Yet building real-time transcription apps is cumbersome at best. Paid APIs like Google Cloud and AWS Transcribe have limited and obscure documentation with regards to the transcription of streamed audio, and the open-source alternatives mostly come down to Mozilla’s DeepSpeech.

In this piece, I’ll share the steps to building your first real-time transcription web app using Python, React, and Google Speech API.

Note: for those who mainly look for the Google Speech streaming Python code — you can find it here.

Press enter or click to view image in full size
The realtime-transcription-playground repository

Setting up Google Cloud

  1. If you don’t have one already, create a Google cloud account.
  2. Click on “Select a project” in the top navigation bar. Then click “New Project” and provide it with a name.

--

--

Sahar Mor
Sahar Mor

Written by Sahar Mor

Bringing the latest in AI to the mass through writings and Github repos | aitidbits.substack.com - generative AI weekly roundup in <2 min

Responses (1)