Member-only story
How to Build a Full-Stack Transcription app with Google Cloud, React, and Python
Research in speech recognition has made significant progress in the last several years with examples such as Facebook’s wav2letter and recent HuBERT. Interest and funding of NLP research are also at an all-time high with breakthroughs such as OpenAI’s GPT-3 and Microsoft’s ZeRO-Infinity.
Yet building real-time transcription apps is cumbersome at best. Paid APIs like Google Cloud and AWS Transcribe have limited and obscure documentation with regards to the transcription of streamed audio, and the open-source alternatives mostly come down to Mozilla’s DeepSpeech.
In this piece, I’ll share the steps to building your first real-time transcription web app using Python, React, and Google Speech API.
Note: for those who mainly look for the Google Speech streaming Python code — you can find it here.
Setting up Google Cloud
- If you don’t have one already, create a Google cloud account.
- Click on “Select a project” in the top navigation bar. Then click “New Project” and provide it with a name.