Convert an mp4 video file into a text summary using python

In this post, we will use FFMPEG and the python speech_recognition module to convert an mp4 video file (e.g youtube) into a text summary.

Darren Willenberg

Published in

MLthinkbox

5 min readNov 8, 2022

Introduction

Lately I have been curious about how to extract text from video content such as youtube and tiktok. Potential use cases are to analyse video sentiments on a specific topic, to create subtitles or (as in my case) to be used as input into a text analytics application.

To achieve our goal we will need to follow a few very specific steps, namely: 1) Installing the FFMPEG software, 2) executing various data transformations from MP4 to MP3 to Wav, 3) and finally, applying the Speech_recognition python module to transcribe the latent text. Lets get started!

Installing FFMPEG

The most pythonic way to convert an MP4 video file into an appropriate data processing format is to use FFMPEG. FFMPEG can be used by executing cmd commands directly from your python script. We will get into this a bit later. If you do not already have FFMPEG installed you can follow the steps below or you can skip ahead to the code example.

FFMPEG installation STEP 1: Download latest build of ffmpeg-git-essentials.7z by clicking here.

Builds — CODEX FFMPEG @ gyan.dev

FFmpeg is a widely-used cross-platform multimedia framework which can process almost all common and many uncommon media…

www.gyan.dev

FFMPEG installation STEP 2: Create a folder called, for example, FFMPEG in your C:/ drive and extract the contents of the downloaded 7z file into the newly created folder.

FFMPEG installation STEP 3: Add the path of the new folder to your windows system Environment. Make sure to use the path to the bin folder where the “ffmpeg” executable is located.

If you are using a mac then look here on how to add environment variables.

FFMPEG installation STEP 4: You can confirm installation by typing ffmpeg in the cmd.

Code walk through

Once FFMPEG is confirmed to be working we can get into transcribing text from video!

We will import os, speech_recognition and ffmpeg modules. I am also declaring variables for the location of the project home directory, the video to be converted as well as the location of the ffmpeg.exe.

Issuing ffmpeg cmd commands via the python os was a bit troublesome requiring a lot of attention to ffmpeg installation and environment variables. FFmpeg cmd commands typical start with the ffmpeg variable followed by the input and output files and their formats.

ffmpeg -i <input_file.format> <output_file.format>

You can find the ffmpeg cheatsheet here. Once you have a bit of control with ffmpeg cmd commands you can insert necessary your variables into a python string. You can learn about python sting formatting here.

Python 3's f-Strings: An Improved String Formatting Syntax (Guide) - Real Python

As of Python 3.6, f-strings are a great new way to format strings. Not only are they more readable, more concise, and…

realpython.com

Earlier we imported speech_recognition as “sr”. We use this to load the recognizer function and to input the processed audio file.

Finally we declare the length of time over which we want to transcribe audio into text and declare our audio variable as source, which is syntax unique to speech recognition. More details on this here.

Based on my review, there are minor mistakes in the resulting text when speech is mumbled or if an unusual word is used. The output text is however highly interpretable and can be used for further analysis. If you are interested in comparing the input and output you can watch the original video here.

Conclusions

Speech recognition requires data in wav format
Installing FFMPEG takes a bit of patience
The output text makes sense and can be used for further text analysis purposes.

If you find a better way of doing this, please let me know! The python code can be accessed here. Thanks!

Get an email whenever Darren Willenberg publishes.

Get an email whenever Darren Willenberg publishes. By signing up, you will create a Medium account if you don't already…

medium.com

Overview of python dependencies

os-sys

fixed: os_sys.log: get_logger function performance: fixed some performance issues bug fixing, making functions and…

pypi.org

The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules. This module provides a portable way of using operating system dependent functionality.

ffmpeg-python

Python bindings for FFmpeg — with complex filtering support

pypi.org

The ffmpeg module is required inorder to convert between different audio-visual content such as mp4, mp3 and wav.

SpeechRecognition

Project links: The library reference documents every publicly accessible object in the library. This document is also…

pypi.org

The speech recognition module will take the wav file as an input and provided interpreted output text.

Convert an mp4 video file into a text summary using python

In this post, we will use FFMPEG and the python speech_recognition module to convert an mp4 video file (e.g youtube) into a text summary.

Introduction

Installing FFMPEG

Builds — CODEX FFMPEG @ gyan.dev

FFmpeg is a widely-used cross-platform multimedia framework which can process almost all common and many uncommon media…

Code walk through

Python 3's f-Strings: An Improved String Formatting Syntax (Guide) - Real Python

As of Python 3.6, f-strings are a great new way to format strings. Not only are they more readable, more concise, and…

Conclusions

Get an email whenever Darren Willenberg publishes.

Get an email whenever Darren Willenberg publishes. By signing up, you will create a Medium account if you don't already…

Overview of python dependencies

os-sys

fixed: os_sys.log: get_logger function performance: fixed some performance issues bug fixing, making functions and…

ffmpeg-python

Python bindings for FFmpeg — with complex filtering support

SpeechRecognition

Project links: The library reference documents every publicly accessible object in the library. This document is also…

References

Written by Darren Willenberg