The adventures of converting VHS tapes to mp4 using EasyCAP and ffmpeg to create a Christmas present

It all started with a bright idea: let’s digitalize some old VHS tapes, and create a cute little montage for the family about the family as a Christmas present. And a mere 2.5 months later here I am, writing about how easy it is on linux, using an EasyCAP device, with ffmpeg.

TL;DR

Use this to read video from EasyCAP:

$ ffmpeg -f v4l2 -framerate <framerate> -i <input device> -t <duration of video> <output filename>

this for audio:

$ ffmpeg -f alsa -i <input device> -t <duration of video> <output filename>

to merge them:

$ ffmpeg -i <input video filename> -itsoffset <audio offset in seconds> -i <input audio filename> -codec copy <output filename>

to convert them to a format that can be played by most devices:

$ ffmpeg -i <input filename> -c:v h264 -c:a aac -pix_fmt yuv420p <output filename>

all the above together:

$ ffmpeg -f v4l2 -framerate <framerate> -thread_queue_size 512 -i <input device> -f alsa -thread_queue_size 512 -i <input device> -t <duration of video> -c:v h264 -c:a aac -pix_fmt yuv420p <output filename>

This is going to be a tutorial on the technical details of the procedure stated in the title told as a personal story. If you want a material to learn from quickly you will be antagonized. If you are looking for some light read about difficulties of creating a Christmas present you will be displeased. You have been warned.

The Artifact

So we acquired the materials to be digitalized by smuggling out the tapes from grandma’s and grandpa’s house. That was the easiest part. Then I headed to the attic to scavenge an old VHS player and a DVD player/recorder. I thought it would be as easy as plugging the VHS player’s output to the DVD player’s input, then plug it in to the TV, start playing, start recording and when the show is over stop both. I managed to save one tape for the generations to come with ease. The quality wasn’t too good, but I did not expect anything to tell the truth. Then something went wrong.

I gave it about 4 or 5 tries to digitalizing the next tape until I gave up. For some reason the DVD player stopped recording at random points in time while making the target DVD unplayable on any device. I was thinking about writing off my losses and abandoning the project altogether when I was given this guy:

My one is labeled VideoDVR but I only needed to open the manual to see that I was given a Syntek STK1160 USB capture device. Obviously the CD the manual kept talking about was missing, but who needs a CD anyway these days? The driver is definitely available on the manufacturer’s site, right? Not even close.

Okay, so we have 2 atlaptops home, one with Windows 10 and another with Arch Linux. “Windows has better driver support” or so do people say, so whenever I find any kind of arcane looking piece of hardware I first try to find some Windows driver to get it to work. First I found this site but I could not set up my EasyCAP with the driver I found there. Then there were some other sites but most of them made both chrome and the antivirus go crazy, so I decided to give linux a try.

It turned out to be a good idea.

So here I found everything I needed to know about EasyCAP. The first thing I found out was that the manual is a liar. When I checked what usb devices were found by the system I found this:

$ lsusb
Bus 001 Device 026: ID 1b71:3002 Fushicai USBTV007 Video Grabber [EasyCAP]

Different manufacturer, different chipset, different fingerprint means different dirver is needed. But hey, on linuxtv.org it says

Kernel 3.19-rc1 and above now include the usbtv kernel module that supports both video and audio.

Cool! My kernel is way above 4.0 so it should have the kernel module.

$ cat /proc/modules | grep usbtv
usbtv 20480 0 - Live 0xffffffffa0a11000

Yep, we’re good.

Play it

All I needed to do was plug in everything to the right place, start the player, start VLC and go to Media > Open Capture Device select /dev/video1 for video, hw:1,0 for audio, press play and I could already see the contents of the tape on my laptops’s screen. Well now, VLC can also convert streams, so let’s try to convert the incoming stream. To do so, you only need to click on the down pointing arrow next to Play and select Convert.

This was the time I first encountered with the fact that I am a complete Jon Snow when it comes to converting video streams. I mean I have created some cheesy montages, I also used ffmpeg to crop and split videos, I also know a couple of things about compression, but I never really understood this whole container and codec mumbo jumbo. The fact that you often receive video files that have incorrect extensions does not help either. MP4? M4a? MPEG? I want to burn it to a DVD in the end, where is that option?! I did some quick research but it merely furthered my frustration, so I decided to simply ignore my ignorance and try all the available formats on some small samples.

None of them came out well. Most of them had very bad quality and almost none of them had sound. I could have tried and recorded the video stream and then used eg. audacity to record the audio, but I wanted to try something else. Let’s see it with ffmpeg!

Enter the Matrix

$ ffmpeg -f v4l2 -framerate 25 -video_size 640x480 -i /dev/video1 -t 0:30 test.mkv

I put this command together by reading a lot of stack overflow and ffmpeg wiki, so probably it could be made better.

-f v4l2: Read the input using video4linux2

-framrate 25: We are in Europe, the PAL standard uses 25FPS, so my tapes should use it too. Not sure if this option is needed to be set at all.

-video_size 640x480: Surely they did not have any better resolution, so I wanted to stop ffmpeg from trying to encode it in eg. 720p. Not sure if this option is needed to be set at all or should be set to something else.

-t 0:30: the duration of the content to be recorded, so I do not have to stop the encoding by hand (VLC-ffmpeg 0:1).

test.mkv: Big movies usually use this file extension, I will too! (Jon Snow and the video formats you know)

It worked, and the end result was beautiful. I mean it looked the same as on the tape. Now let’s see the audio part:

$ ffmpeg -f alsa -i hw:2,0 -t 0:30 test.wav

I chose to save it as .wav because I knew it was an uncompressed audio format. I had no idea about the actual codec that was used by ffmpeg.

Wonderful. Let’s put it together!

$ ffmpeg -f v4l2 -framerate 25 -video_size 640x480 -i /dev/video1 -f alsa -i hw:2,0 -t 0:30 test.mkv

Was it any good? Parts of it were… kind of. I ended up with either muted audio and proper video or halted video and proper audio. Bummer.

Now which one to use? With VLC, I had okay results and at least I knew the codec that was used, but when the video was of acceptable quality, the audio was non-existent. With ffmpeg I also had to capture the audio and video separately, and I had no idea of the codecs used, but to be honest I knew nothing about the topic either. On the other hand I was given the ability to automatically stop recording at a set point. So being more comfortable, ffmpeg gained the upper hand.

After recording both the audio and the video streams I was able to merge them using ffmpeg. The streams were not in sync, but I could adjust their delay in mplayer and than use the -itsoffset option accordingly.

So if the audio was lagging behind the video by 4.1 secs:

$ ffmpeg -i first_christmas.mkv -itsoffset -4.1 -i first_christmas.wav -codec copy first_christmas_final.mkv

The end result was not perfect, but close enough. Usually the audio and the video streams did not have the same speed, but this only turned out to be a nuisance when the tape was more than 4 hours long. The speed could have been adjusted but as we were planning to create a montage, it was completely sufficient to adjust the audio of the chunks when it was necessary.

Slice and Dice

Bash scripting imminent. You might want to jump to the next section if you have no idea what sudo rm -rf / does

There was another problem however. The created files were quite huge, eg. the 4 hours tape ended up using 12GB of hard drive. When you want to move around files on an 8GB pendrive, relying on the speed of USB2.0 and you plan to edit them using some sort of video editor software that loads the clips into memory, this is rather inconvenient to say the least.

So I decided to use my new friend to split the videos into more reasonable chunks. I tried using the one line solution first:

$ ffmpeg -i input.mp4 -c copy -map 0 -segment_time 10:00 -f segment output%03d.mkv

But I ended up with chunks that had the length of their index×10 minutes with 10 minutes of data in each. So eg. output003.mkv was 30 minutes long but had video data from 0–10 mintues, while output000.mkv was o minutes long but padded the stream by 10 minutes.

So I decided to write a little script for that:

#!/usr/bin/env bash
LENGTH=600
input=$1
directory=$(echo $input | sed -r 's|/[^/]+$||')
duration=$(ffprobe $input 2>&1 | awk '/Duration/ {print $2}' | sed 's/,//')
duration_s=$(echo $duration | awk -F':' '{print $1 * 60 * 60 + $2 * 60 + $3}' | sed -r 's/\..*//g')
slices=$(($duration_s/$LENGTH+1))
ss=0
for ((i=0;i<$slices;i++)); do
to=$(($i*$LENGTH))
ffmpeg -i $input -ss $ss -to $to -c copy $directory/part$i.mkv
ss=$to
done
ffmpeg -i $input -ss $ss -c copy $directory/part$i.mkv

Bash wizards might want to jump to the next section.

Let’s take it apart to make it easier to understand:

directory=$(echo $input | sed -r 's|/[^/]+$||')

This command removes the filename from the fed path, by removing everything after the last /. The -r option of sed allows extended regex syntax. If you do not know regexes try /[^/]+$ in regex101.com for explanation. Make sure to check ‘python’ otherwise the slashes need to be escaped by \-s.

duration=$(ffprobe $input 2>&1 | awk '/Duration/ {print $2}' | sed 's/,//')

We check the length of the video using ffprobe. It outputs the same data as ffmpeg when it starts. Both of them print to stderr, so we need to redirect the output to stdout using 2>&1 so we can pipe it to awk. awk 'Duration/{print $2}/' looks for lines containing the word ‘Duration’ (case sensitive) then prints the 2nd column (2nd word) of each line. In our case the output of ffprobe looks like this:

Input #0, matroska,webm, from 'output.mkv':
Metadata:
ENCODER : Lavf57.56.100
Duration: 04:06:58.12, start: 0.000000, bitrate: 6857 kb/s
Stream #0:0: Video: h264 (High 4:2:2), yuv422p(progressive), 720x576, 25 fps, 25 tbr, 1k tbn, 50 tbc (default)
Metadata:
ENCODER : Lavc57.64.101 libx264
DURATION : 04:06:58.120000000
Stream #0:1: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
Metadata:
DURATION : 04:00:04.533000000

As awk is case sensitive we should have only 1 match for ‘Duration’

After that we simply remove the dangling comma in the end using sed.

Next we need to convert our HH:mm:ss format to seconds and remove the milliseconds.

Finally we slice up the videos using

$ ffmpeg -i $input -ss $ss -to $to -c copy $directory/part$i.mkv

where -ss is the start time in seconds and -to is the end time in seconds. A constant -t 600 option for duration could have been used as well.

After the cycle we need to invoke ffmpeg one last time to have one remaining slice extracted that is not full 10 minutes long. There are more elegant ways to do it, but for my purposes the quick and dirty solution was more than enough.

This script would be invoked as

$ ./split.sh VHS/1990/first_christmas_final.mkv

Deck the halls with matroskas

And so after a month of recording, we were almost done. Now all there was left was to create the montage. I handled the technical details so far, and it was a present for my significant other’s family so it only made sense that she does the editing herself. We spent a part of the festive season separately with our families, so she was on her own with the files I gave her. On the night of the 24th I received a distress call:

No editor can read the files, or if any of them can, they can only use one video at a time.

No problem! ffmpeg can handle it easily… or can it? It turns out that we were given not only VHS tapes but old time DVDs and files recorded by phones as well. All in different formats and it turns out that concatenation is not fun and games even when videos are of the same kind.

Then was the time I started to look into containers and codecs. But I had a great problem. I could only test them on my machine. Well neither mplayer nor VLC or ffplay had any problems playing any of those files, how do I know I ended up with the right format? This was the time when we had to admit defeat. It was already past midnight, we had to send huge amounts of data through the wire to each other and had no clue of how and to what format should we convert all those videos so my SO could edit them, and she wanted to play it to her grandparents the next day. It did not make sense to stay up all night while the light of hope was getting fainter and fainter.

But all was not lost! It can be a good present for a birthday, or a wedding anniversary too, not to mention it is a great opportunity to learn more about video formats, containers and codecs. (This ‘containers and codecs’ mantra starts to sound like it something Winnie the Pooh would keep repeating, doesn’t it?)

Then on the night of the 25th we visited an auntie who wanted to see the raw material nevertheless. We had footages of her children too so it was only fair to give her at least those parts.

‘I have my laptop here, do you have an HDMI cable?’

‘No but the TV can play from USB’

‘Cool I can copy the videos to my pendrive’

‘Yay! Turn off the lights, we’ll see you guys when you were little’ she said.

But the TV did not want to play them. ‘Unsupported video format’ it said. Never mind, ffmpeg to the rescue!

$ ffmpeg -i first_christmas_final.mkv first_christmas_final.mp4

As it turns out ffmpeg is smart enough to convert between containers inferred from the extensions. Now it starts reading it at least. But still no luck. ‘Unsupported video stream’. Let’s try VLC. Media > Convert / Save… Selecting Video for MPEG4 720p TV/device looked promising. Convert, wait a bit, write to pendrive, plug it in, aaaaaaand. Starts loading. Starts playing. ‘Unsupported audio stream’. So close! And yet so far away, we ended up watching the videos from my laptop, but finished soon enough too as these quick little trials took almost 2 hours.

It’s never too late

Two days later this whole thing was still bugging me. Now that I have found a way to test my formats (writing it to a pendrive and plugging it in to a TV) and I started to have an idea about what might have gone wrong. I decided to read into the topic.

So in case you were wondering:

The file extension at the end of the filename usually indicates the container. A container can contain multiple auido and video streams as well as subtitles. The 3 most common containers are:

  • flv, swf — Flash video: you might see this format still a lot even though it’s becoming extinct
  • mp4 — MP4, simple as that. Quite ubiquitous, recommended for web, your phone probably records to this cotanier too. Quite restrictive regarding encodings. Only registered codec implementations can be used
  • mkv — Matroska, very versatile, designed to be future proof. Supports almost all audio and video formats and can contain multiple streams with different encoding. It is gaining in popularity but as seen from the TV disaster it is not supported everywhere yet

Knowing the container does not give away too much information on the video and audio encoding, so you cannot be sure if you can play the file or not, only inferring from its extension.

Video codecs:

This is where you can get lost very quickly and very easily. Codecs can be very versatile eg. H.264, the codec that is used for HD stuff mostly can do lossy and lossless encoding too. Mpeg is both the name of a container and a codec family. That is crazy. (You can tell the difference between the mpeg container and codecs though: the codecs have numbers, eg. mpeg-2)

Audio codecs:

You might be familiar with these as you know uncompressed (wav, aiff), lossless compressed (flac, wma) and lossy compressed audio formats (mp3, aac). Be advised though that most of the time I tried using H.264 video with mp3 it did not work. However, the error might have been on my side, as I had no idea what I was doing.

Anyway. I had an idea about containers and audio codecs, and I was finally able to recognize a video codec if I saw one, so I tried looking at the files I had with ffprobe:

Input #0, matroska,webm, from 'output.mkv':
Metadata:
ENCODER : Lavf57.56.100sort the output in descending order and select the first one using head to make sure we split the video according to the length of the longer stream
Duration: 04:06:58.12, start: 0.000000, bitrate: 6857 kb/s
Stream #0:0: Video: h264 (High 4:2:2), yuv422p(progressive), 720x576, 25 fps, 25 tbr, 1k tbn, 50 tbc (default)
Metadata:
ENCODER : Lavc57.64.101 libx264
DURATION : 04:06:58.120000000
Stream #0:1: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
Metadata:
DURATION : 04:00:04.533000000

Okay, a lot of stuff was latin for me here, but what I needed was this:

After Input #0 it says the name(s) of the container

After Stream #0:0 you can see info on the Video stream, the first piece of info being the codec used. In our case it is H.264

Stream #0:1: Info about the audio stream. The codec is erm… something.

And at the ones that were created by VLC without sound:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output.mp4':
Metadata:
major_brand : isom
minor_version : 0
compatible_brands: mp41avc1
creation_time : 2016-12-25T16:27:51.000000Z
encoder : vlc 2.2.4 stream output
encoder-eng : vlc 2.2.4 stream output
Duration: 00:04:26.40, start: 0.000000, bitrate: 1847 kb/s
Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 27:32 DAR 3:2], 1845 kb/s, 23.90 fps, 29.97 tbr, 1000k tbn, 59.94 tbc (default)
Metadata:
creation_time : 2016-12-25T16:27:51.000000Z
handler_name : VideoHandler

Okay, so one of the problems is that the ones created by VLC do not have an audio stream at all. It was able to play the videos themselves without any problem but for some reason it decided to omit the sound while converting. Strange.

To figure out what might be the format I actually want, I took a look at a video that played on the TV (I recorded it with my phone earlier)

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'VID_20151122_133709.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
creation_time : 2015-11-22T12:41:34.000000Z
location : +47.1214+018.8642/
location-eng : +47.1214+018.8642/
com.android.version: 6.0
Duration: 00:04:22.64, start: 0.000000, bitrate: 17095 kb/s
Stream #0:0(eng): Video: h264 (Baseline) (avc1 / 0x31637661), yuv420p, 1920x1080, 16998 kb/s, SAR 1:1 DAR 16:9, 29.82 fps, 29.83 tbr, 90k tbn, 180k tbc (default)
Metadata:
creation_time : 2015-11-22T12:41:34.000000Z
handler_name : VideoHandle
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 96 kb/s (default)
Metadata:
creation_time : 2015-11-22T12:41:34.000000Z
handler_name : SoundHandle

So H.264 for video, aac for audio, in an mp4 container.

$ ffmpeg -i first_christmas_final.mkv -c:v h264 -c:a aac output.mp4

Still won’t play. What else am I missing? I compared the output of the new file and the one recorded by phone using an online diff tool to see the differences. There were a lot, so I will not paste the image here. They differed in a lot of things as some metadata was present in one and some was present in the other. The only difference in the streams themselves that belonged to a feature owned by both the files and was not dynamic such as the time, or meaningless for me such as the bitrate was this: yuv422p from the tape and yuv420p from the phone. As it turns out YUV is a color palette where Y is the luma or brightness of a given color, and U and V are the chrominance components. yuv422p and yuv420p differ in the encoding of the colors. So what was missing is simply converting the color encoding too.

$ ffmpeg -i first_christmas_final.mkv -c:v h264 -c:a aac -pix_fmt yuv420p output.mp4

Finally the end result plays on TV so it can be distributed among relatives, can be edited by all kinds of video editors and most of the process can be automated thanks to the fact that only ffmpeg was used. And luckily we have a year to create the montage too… hope it will be enough this time.


Half a year later I still receive VHS tapes to digitalize them and I have gained some experience meanwhile. Since last time I have learnt to understand warnings ffmpeg gave me, so I came up with the ultimate command that is able to read both the audio and video input, and convert the pixel format to yuv420p in the meantime:

$ ffmpeg -f v4l2 -framerate <framerate> -thread_queue_size 512 -i <input device> -f alsa -thread_queue_size 512 -i <input device> -t <duration of video> -c:v h264 -c:a aac -pix_fmt yuv420p <output filename>

eg.:

$ ffmpeg -f v4l2 -framerate 25 -thread_queue_size 512 -i /dev/video1 -f alsa -thread_queue_size 512 -i hw:2,0 -t 02:14:32 -c:v h264 -c:a aac -pix_fmt yuv420p output.mp4

With the previous commands the CPU load went through the roof, and I assumed that my processor was just not powerful enough to convert audio and video at the same time. I was wondering a bit how other video editing software managed to pull that trick off, but didn’t give it too much thought. Then I saw the warning “Thread message queue blocking; consider raising the thread_queue_size option (current value: 8)”, so I raised it and now I can actually use my computer while the tape is being captured.