VideoSync

Automatically Synchronizing Crowd-Sourced Concert Videos

allison deal
3 min readMar 10, 2014

Over the summer, I built a VideoSync web application to automatically synchronize and play crowd-sourced video clips. A user can provide YouTube links or upload videos of the same song, and play back the synchronized clips to recreate the concert from multiple perspectives. VideoSync utilizes HTML5 video and JavaScript for front end development and Python for back end signal processing algorithms. Here’s how it works.

Video Synchronization Process

The main challenge in achieving the application’s functionality was to determine the time at which the videos overlap, since the start and end times of each clip vary. This synchronization is achieved based on peak frequencies in the audio component of the videos.

YouTube Link or File Upload: To start this process, the YouTube videos are downloaded as MP4 files with the youtube-dl command line program. User-uploaded videos are simply stored on the server.

WAV File: The audio is stripped from these video files using the avconv audio/video converter, which is read using the Python SciPy library.

Fourier Transform of Audio Signal: In order to analyze frequencies of the audio samples across time, the audio data needs to be converted from the time domain to the frequency domain, which can be achieved by using the Fourier transform. To preserve the time dimension in the data, the audio is split into bins, at a size of 1024 samples per bin. The Fourier transform is then applied to each bin using the NumPy library, which converts each bin’s data from the time domain to the frequency domain.

Peak Frequencies: With the data now represented in the frequency domain, the next step of the analysis is to identify the frequency with the highest intensity in each bin and create a peak frequency constellation. In the diagram above, each peak frequency is represented as an X.

Frequency Constellation Alignment: Finally, the time offset is determined by aligning the frequency constellations of the two audio files. This can be visualized as two transparencies overlapping and sliding horizontally until the peak frequencies align in both graphs. The distance one transparency needs to be slid past the start of the other transparency is the time offset of the two videos.

Video Playback: With this time offset information for the video pair, coordinated playback of the videos is triggered in the front-end of the application with JavaScript/jQuery.

Alternative Approaches

The process described above was not my first approach in developing the synchronization algorithm. Other techniques explored include dynamic time warping and direct spectogram comparison. The final algorithm was fine-tuned for speed and accuracy through experimentation with different Fourier transformation bin sizes, bin overlapping, and adding a horizontal grid component to frequency constellation mapping.

Future Improvements

I plan to eventually revisit this project and add the ability to synchronize more than two videos, as well as further refine the synchronization algorithm by applying additional audio filters and incorporating video location data into the algorithm.

VideoSync source code is available on GitHub at https://github.com/allisonnicoledeal/VideoSync

--

--