Google Summer of Code : Weeks 3 and 4, Excitement modelling — Analysing Density of cuts and change in average motion
These weeks were spent on experimenting various algorithms and methods for cuts and motion detection. This blog post covers the process followed and challenges faced while the proceeding for the project.
The theoretical details of the projects can be found here — “Multimodal approach to measuring excitement in video ”.
The match sample used for testing was “ UCL — 08 Mar 2017 — FC Barcelona 6–1 Paris Saint-Germain — First Half — HD” (~56 mins)
Analysing density of cuts as an excitement modelling parameter
To detect cuts, the first attempt was by detecting the average change in HSV parameters for each frame. The frames iterated were split into H,S and V, average change was calculated for each channel and added. This method was able to accurately detect cuts but it was quite slow. A twenty second video took ~14 secs for processing. And an hour long video couldn’t be completely processed in an hour before my laptop gave up and crashed. After a few attempts, by reducing the frame rate and reducing the resolution of video, the method was found to be inefficient for a daily use software. [Ipython notebook]
Detecting luminosity rather than HSV was a more efficient way that already existed. It proved effective and an hour long video (fps=10, resolution = 240*320) took around 15 minutes processing. [Ipython notebook]
Then the cut density, c(k), was calculated by the expression:
p(k) and n(k) be frame indexes of the two closest cuts respectively left (previous cut) and right (next cut) to frame k and delta be the scaling constant. The cuts list was prepared by:
The density graph was a rough step curve when plotted. Then its convolution with the Kaiser window function smooth it out.
The peaks of smooth_cut_density curve were detected by finding points having both increase AND decrease. The nearby peaks were filtered out and the higher peaks were selected.
The following curve was obtained: (edited to show actual happening of events corresponding to frame indexes)
Motion Detection Algorithm
The algorithm for detecting average change in motion was experimented in week 4. The notebook can be found here.
The first attempt of thought was to generate the motion vectors and calculate their average change for consecutive frames. But for a practical application, the requirement of GPU’s and high processing units rendered the method to be useless.
A simple and elegant solution for detecting motion was to calculate change in each pixel of the video frames.
The difference in consecutive frames was calculated and added. This gave the total change in video frames. This change was plotted with the frame indexes. The rough curve thus achieved was smoothened out with kaiser window. Then peaks were found by detecting the points with both Increase AND Decrease and then the nearby peaks were filtered out.
The following curve was achieved: (edited to show actual happening of events corresponding to frame indexes)
Did it work?
Partially. Concatenating both of the series into two different final videos it was found that the peaks were not exactly aligned with the actual happening of the expected highlight event. And this goes as expected, with the results of the research paper referred “Multimodal approach to measuring excitement in video ”. It says:
“When compared with the content description of characteristic segments of this excerpt , one can see that at places of exciting events (goals, chances), local maxima of all three curves can be found, as opposed to less exciting segments. One can also realize that these local maxima are not necessarily aligned. For instance, in the case of a score, the following scenario is possible: the audience first cheers the action (sound energy peak), then there are zooms to running players (motion activity peak) and, finally, there are zooms to the bench and to spectators (cut density peak).”
Both the algorithms that were experimented, marked the important (exciting) events by showing peaks in the respective curves. These peaks were found in the close vicinity of the actual happening of events. Thus detecting cuts and average motion was completed successfully enough for further usage in extraction of highlights.
What’s Next ?
The immediate next thing to be done is to obtain a similar result by “Analysing the audio component of the video”. Then these results obtained will be used to extract the highlights of the video. Also I will try to analyse subtitles to mark them positive or negative for detecting an. This part will be combined with the Python bindings being developed to extract subtitles and analyse an exciting event.