Member-only story
How to build a video transcription web app using generative AI
A weekend project that shows how to combine web development and generative AI tools
Generative AI is one of the hottest areas in Data Science and Computer Science in general. However, it is also one of the most difficult to be up to date with. The number of research papers that come out every week with a new concept, a new technique, a new model, the number of tools that become available, the number of players in the field even, all of these make any personal attempt to be on top of it, overwhelming and intimidating. All this make difficult to know where and how to start. I believe that this, like any complex task, is better tackled little by little. In this post, I present a small and simple project that can show you how to apply one of the most successful generative ai areas, speech-to-text transcription into a real world problem using a web app.
1. Video transcription
To transcribe speech in videos into text we will use the OpenAI whisper model. This generative model was developed by OpenAI in 2022. It can transcribe speech in multiple languages (see diagram).
Whisper expects a sound file in wav or mp3 format as input and returns a string with the transcription text from the input file. In this web app, whisper will be a service that will