Understanding Aaron Swartz’s KeepGrabbing.py

Shane Lee
5 min readAug 11, 2020

Aaron Swartz was a programmer and political activist, who infamously downloaded an estimated 4.8 million articles from the JSTOR database of academic articles. This led him to be prosecuted by the United States.

In the documentary “The Internet’s Own Boy” about Aaron Swartz, there is a script that is referenced called KeepGrabbing.py. This script is what Aaron used to bulk download PDFs from the JSTOR database from within the MIT network.

When first saw the documentary, I was fascinate by the idea that a simple computer program could cause such furore. Naturally, I wanted to know what was in the program.

It turns out this program is just a simple python 2 script, that is only 21 lines long (17 if you don’t include blank lines!)

Here is the excerpt from the documentary in which the script is briefly discussed.

I managed to find the script online. But I didn’t immediately understand it. I got the general gist, but there were a few things I had not seen before like the one line class which Aaron defines which is just an exception that appears to just do nothing. There’s also a URL which has been reacted by the courts, which further makes this program difficult to understand.

--

--