systemd is a pretty great way to schedule cron like tasks using Linux. In the past, I’ve used everything from cron to full blown distributed task scheduling systems. For my latest project (kickdb.com), I had been using a pretty hacky combo of tmux, while, and sleep on a tiny Digital Ocean server, which eventually became cumbersome. I needed to find a better alternative.
My requirements for a task scheduler were:
- Run some quick tasks every 15 minutes
- Run some long running tasks every 12 hours (sometimes these tasks take over 30 hours)
- Don’t run a task if the same type of task is already running. For example, don’t start a new instance of the adidas.com scraper if an old instance is still running.
After learning about systemd, I learned that it met all my requirements and had some pretty awesome benefits:
- Automatic log rotation
- Random delays: add
RandomizedDelaySecto a timer
- Memory and CPU limits: add
MemoryLimitto a slice
- Easy to debug:
systemctllets you view logs, exit codes, and timer history
I’m still a novice at using systemd, so please keep in mind that the configuration and commands below may be incorrect.
My systemd setup
KickDB.com scrapes around 60 sneaker stores. My first attempt at using systemd involved setting up a recent import service and a full import service for each of the stores. Unfortunately, whenever I installed and enabled these 120 services, the server would run out of memory and CPU and need to be power cycled. I spent a few days trying out different configurations, but at the time I couldn’t avoid having too many node processes starting at once. So, instead I grouped the sites by platform and now run only 10 services.
To setup my cron-like system, I used these systemd features:
- Services describe the work do. I have five recent import services and five full import services. The only real difference between each service is the command run in
- Timers describe when to run the work. I have five recent import timers and five full import timers. The recent scrapers run every 15 minutes, while the full imports run every 12 to 24 hours.
- Slices describe resource limits. I have one slice that limits the resources used by all my services as a whole.
ExecStart uses the full path to node.
WorkingDirectory launches the script in the same folder as my code.
Description=Runs shopify recent scraper
ExecStart=/usr/local/bin/node /app/shoes-scraper/src/scraper --recent --platform shopify
This timer runs every 15 minutes + some random duration below 15 minutes.
Description=Run shopify-recent every 15-30 minutes
Because all my scrapers depend on
shoes-scraper.slice , I can make sure my scrapers collectively never use more than 80% of the CPU or more than 2.7G of RAM on my server. Doing something like this with cron would be pretty tough.
Description=Limited resources Slice
Installing the services
To install and start these services I run:
systemctl stop shopify-recent shopify-full ...systemctl daemon-reloadsystemctl enable shopify-recent.timer shopify-full.timer ...systemctl start shopify-recent shopify-full ...
If something went wrong when setting up a service, you can usually find it out by running
systemctl status .
I don’t like running these commands every time I make a change, so I wrote a script to generate and install my systemd configuration.
systemctl start SERVICE
systemctl stop SERVICE
systemctl status SERVICEsystemctl list-timers # view the status of the timersjournalctl # view the full systemd logs in less
journalctl -u SERVICE # view the logs for a specific service
journalctl -f # tail the logs
journalctl -f -u SERVICE # tail the logs for a specific service
In conclusion systemd can be a nicer alternative to cron. Compared to cron, it can take a lot of time to configure systemd. But, systemd makes other things easier, like debugging errors, setting CPU and memory limits, and randomized scheduling.
I shared this blog on Reddit and got some pretty great feedback. Here’s some I’d like to highlight: