Member-only story

How to build a video transcription web app using generative AI

5 min readJun 17, 2024

A weekend project that shows how to combine web development and generative AI tools

Generative AI is one of the hottest areas in Data Science and Computer Science in general. However, it is also one of the most difficult to be up to date with. The number of research papers that come out every week with a new concept, a new technique, a new model, the number of tools that become available, the number of players in the field even, all of these make any personal attempt to be on top of it, overwhelming and intimidating. All this make difficult to know where and how to start. I believe that this, like any complex task, is better tackled little by little. In this post, I present a small and simple project that can show you how to apply one of the most successful generative ai areas, speech-to-text transcription into a real world problem using a web app.

1. Video transcription

To transcribe speech in videos into text we will use the OpenAI whisper model. This generative model was developed by OpenAI in 2022. It can transcribe speech in multiple languages (see diagram).

OpenAI whisper diagram from https://github.com/openai/whisper

Whisper expects a sound file in wav or mp3 format as input and returns a string with the transcription text from the input file. In this web app, whisper will be a service that will

about ai

How to build a video transcription web app using generative AI

1. Video transcription

Published in about ai

Written by Edgar Bermudez

No responses yet