This time I worked on detecting slow motion from, mainly, soccer videos. I referred to this paper. Let me guide you through it.

First up, it’s important to know that there are two types of slow motions — one consists of repeated and inserted frames and the other that comes from cameras that record at high speed/fps. Taking aside their differences from the normal speed clips, both of them differ from each other as well. Let’s call the first one SMStd and the second SMHigh.

The thing with slomos in sports is that they are always preceded and succeeded by a gradual transition effect; usually an animation of the sponsor’s/studio’s logo. This makes things easier since now only the clips that are between gradual transitions need to be checked.

Once the candidate shots between gradual transitions have been found, HSV frame difference is calculated between consecutive frames.

X_t(i, j) is the H/S/V value of a pixel (i, j) on frame t. M and N correspond to the height and width.

Using this, repeated and inserted frames can be found; like if DF[n] is less than a threshold, n will be a repeated frame and since inserted frames are a combination of actual frames, their DF pattern is like this —

p and q+1 are the actual frames captured by the standard camera; everything in between is inserted

Thus, based on criteria revolving around DF, repeated and inserted frames are found and a clip is marked as SMStd only if in every 30 consecutive frames there are more than 10 repeated and inserted frames among them.

Then comes the turn of SMHigh. The authors introduced a lot of notations that help determine whether a clip is SMHigh.

Yep, important.

The test for SMHigh is done on those candidate shots that were discarded by the SMStd criteria — the non-SMStd. The main idea behind SMHigh’s classification is that since high speed cameras are used, there are less/no repeated+inserted frames which makes their frame differences, typically, greater than those of SMStd. Thus, the average frame difference is calculated for all SMStds and is compared with all of non-SMStds’. The non-SMStds whose average frame difference is higher are kept for further processing.

The paper also talks about how a close-up shot can be mislabelled as a SMHigh. To rectify this, the dominant color % of the non-SMStd is found and is compared with those of SMStds. For almost all SMStds the dominant color is green but that’s not really the case with a close-up shot. Also, since SMHighs show the same event as SMStds, they should have similar dominant color.
So based on this, the selected non-SMStds that have similar dominant color as SMStds are chosen to be SMHigh; essentially discarding closeup shots.

Once the processing with all the non-SMStd is completed, the neighbouring shots of SMStds also go through the same processing. This is due to the fact that most of the times a SMHigh succeeds a SMStd.

And that’s how you get slomos in soccer!

Or so I thought! Even though the authors were able to achieve such great results (100% recall), I haven’t been able to reproduce them. I tried coupling different stuff like motion estimation — since the amount of motion in slomos is very less and histogram comparison — since the surroundings don’t change much but to no avail. I will keep looking though!

Until next time.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade