Running Long Scripts

It all comes down to ctrl-z and caffeinate

Ryan Knightly
The Floating Point

--

Image source

One of my first programming projects was a web scraper that collects various information about a list of more than 2,400 stocks. For each stock, I wanted metrics from each of about 8 different websites, so I had to load and scrape about 19,200 pages of information.

Even after parallelizing the web requests, running the script still takes about 3 hours to complete.

That’s a long time.

And during that time, something always goes wrong. Whether my computer falls asleep, I accidentally kill the program, or I just shut my computer because I forget about the script, the program would exit before completion about 75% of the time. And when the program exited, all of the data was lost and it had to be run again from the beginning. And so:

Problem: long scripts get killed before completion.

Possible solutions:

  1. Save the data as the script runs so that even if the script is killed it can be resumed (requires writing code to write and read the data. Too much work and needs to be re-written for each script using this method)
  2. Run the script on a server that doesn’t randomly get turned off every 30 minutes like my laptop. (Not worth it in my case)
  3. Force my computer to stay awake until the script finishes running (easy and reusable)
  4. Use the built-in pause and resume functionality from the OS for when I need to shut my computer (easy and reusable)

I don’t know about you, but easy and reusable are some of the most beautiful words I know of, so I went with solutions 3 and 4.

Preventing System Sleep

Part of the solution is making sure that my computer stays awake until the script is done executing.

An obvious way to do this is to just change the system settings to only sleep after a few hours or to just never automatically go to sleep.

Ubuntu Power Settings

That isn’t the best solution though, because then you would have to change the setting back afterward or just have to deal with a computer that never sleeps.

I wanted a better solution, and I came across Caffeinate, a command line tool that prevents sleep during a script’s execution.

Caffeinate manual page

Now I can just run caffeinate python main.py and the system won’t sleep until the script is done.

Also, I also found a Mac app that does about the same thing. It is called Amphetamine, and it lets you choose an amount of time to keep your Mac awake.

Amphetamine menu options

To sum it up: to prevent system sleep, use caffeinate and if you have a Mac try Amphetamine.

Pausing and Resuming Script Execution

If you need to pause the script for a while and resume it later, you can also use Linux’s built-in process stopping and starting.

To stop a process during its execution, just press ctrl-z.

Stopping a program with ctrl-z

Now the process is stopped and can be resumed whenever you are ready to continue running the script. To view the stopped jobs simply type jobs

Viewing the stopped jobs

From there, you can restart the jobs using fg (for foreground) like so:

Resuming the script

And then the script is back up and running.

Takeaways

  1. Keep the system awake with caffeinate or the Amphetamine app
  2. Pause a script with ctrl-z and resume using fg and the job number given by running jobs

Running long scripts without them getting killed halfway should now be easier. For more important applications, however, you may want to revisit the first two proposed solutions, as they would probably be more reliable.

--

--