Running Long Scripts
It all comes down to ctrl-z and caffeinate
One of my first programming projects was a web scraper that collects various information about a list of more than 2,400 stocks. For each stock, I wanted metrics from each of about 8 different websites, so I had to load and scrape about 19,200 pages of information.
Even after parallelizing the web requests, running the script still takes about 3 hours to complete.
That’s a long time.
And during that time, something always goes wrong. Whether my computer falls asleep, I accidentally kill the program, or I just shut my computer because I forget about the script, the program would exit before completion about 75% of the time. And when the program exited, all of the data was lost and it had to be run again from the beginning. And so:
Problem: long scripts get killed before completion.
Possible solutions:
- Save the data as the script runs so that even if the script is killed it can be resumed (requires writing code to write and read the data. Too much work and needs to be re-written for each script using this method)
- Run the script on a server that doesn’t randomly get turned off every 30 minutes like my laptop. (Not worth it in my case)
- Force my computer to stay awake until the script finishes running (easy and reusable)
- Use the built-in pause and resume functionality from the OS for when I need to shut my computer (easy and reusable)
I don’t know about you, but easy and reusable are some of the most beautiful words I know of, so I went with solutions 3 and 4.
Preventing System Sleep
Part of the solution is making sure that my computer stays awake until the script is done executing.
An obvious way to do this is to just change the system settings to only sleep after a few hours or to just never automatically go to sleep.
That isn’t the best solution though, because then you would have to change the setting back afterward or just have to deal with a computer that never sleeps.
I wanted a better solution, and I came across Caffeinate, a command line tool that prevents sleep during a script’s execution.
Now I can just run caffeinate python main.py
and the system won’t sleep until the script is done.
Also, I also found a Mac app that does about the same thing. It is called Amphetamine, and it lets you choose an amount of time to keep your Mac awake.
To sum it up: to prevent system sleep, use caffeinate and if you have a Mac try Amphetamine.
Pausing and Resuming Script Execution
If you need to pause the script for a while and resume it later, you can also use Linux’s built-in process stopping and starting.
To stop a process during its execution, just press ctrl-z.
Now the process is stopped and can be resumed whenever you are ready to continue running the script. To view the stopped jobs simply type jobs
From there, you can restart the jobs using fg
(for foreground) like so:
And then the script is back up and running.
Takeaways
- Keep the system awake with
caffeinate
or the Amphetamine app - Pause a script with
ctrl-z
and resume usingfg
and the job number given by runningjobs
Running long scripts without them getting killed halfway should now be easier. For more important applications, however, you may want to revisit the first two proposed solutions, as they would probably be more reliable.