Taking Screenshots of Conference Video Slides Using Python

JL Diaz
JL Diaz
Nov 6 · 3 min read

I use to take notes while watching conference videos in YouTube, and sometimes, as part of the notes, I want to include a screen capture of some slides.

When I’m on Mac, I use the system capture (Command+Shift+5) which has the nice feature of “remembering” the region of the last capture. This is convenient, because once I selected the region in which the YouTube video appears, I can take several frames of the same video without repositioning the selection. In Windows, I use ShareX (open source) which has an option for capture “last region”, providing the same convenience.

The problem is that I don’t want the whole frame, like this one:

Screenshot from YouTube

I want only the part of the slide. And, as in the above example, some conferences use a video format in which the slides are slightly distorted due to a perspective effect.

So I thought “this is a perfect task for Python”. So I wrote a script which takes the image in the clipboard, and uses PIL to crop a quadrilateral given by its four corners, and “un-distorts” it to make it rectangular.

The result for the above image is:

After processing

The drawback is that you have to manually locate the coordinates of the four corners of the slide, but once you have them, the same script can be run repeatdly for different screenshots of the same conference (and even for other videos which use the same format).

To help this manual process, when the script is run without arguments, it generates a slides.png file which simply contains the original youtube frame, scaled to a width of 1024 pixels (to normalize the coordinate system among different videos). Using any image editing tool, I find the (x,y) coordinates of each corner (in clockwise order, starting at upper left), and I add them to a dictionary in the script like this:

conferences = {
"none: None,
"pydata2015w": [(283,88), (1013, 73), (1013, 500), (283, 485)],
]

Then, running again the script with the argument pydata2015w, it will use the info from this dict to crop, undistort and save (in slide.png) the selected region.

This is the script (tested under OSX and Windows):

And this is one more example:

Screenshot from YouTube

In this case, the coordinates are [(0,0), (702,48), (702,418), (0,447)] and the result of the script is

After processing

Admitedly, this could be improved by adding the feature of auto-detecting the corners of the slide, but this would probably require OpenCV and it is not an easy task, in general, because sometimes the background is noisy or patterned, instead of a solid color, as in previous examples.

So I stick by now to the manual procedure. After all, once the coordinates are obtained for a given conference, it usually can be used for all its videos.

JL Diaz

Written by

JL Diaz

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade