Using systemd as a better cron

luqmaan
luqmaan
Mar 7, 2018 · 4 min read

systemd is a pretty great way to schedule cron like tasks using Linux. In the past, I’ve used everything from cron to full blown distributed task scheduling systems. For my latest project (kickdb.com), I had been using a pretty hacky combo of tmux, while, and sleep on a tiny Digital Ocean server, which eventually became cumbersome. I needed to find a better alternative.

My requirements for a task scheduler were:

  • Run some quick tasks every 15 minutes
  • Run some long running tasks every 12 hours (sometimes these tasks take over 30 hours)
  • Don’t run a task if the same type of task is already running. For example, don’t start a new instance of the adidas.com scraper if an old instance is still running.

After learning about systemd, I learned that it met all my requirements and had some pretty awesome benefits:

  • Automatic log rotation
  • Random delays: add RandomizedDelaySec to a timer
  • Memory and CPU limits: add CPUQuota or MemoryLimit to a slice
  • Easy to debug: systemctl lets you view logs, exit codes, and timer history

I’m still a novice at using systemd, so please keep in mind that the configuration and commands below may be incorrect.

My systemd setup

KickDB.com scrapes around 60 sneaker stores. My first attempt at using systemd involved setting up a recent import service and a full import service for each of the stores. Unfortunately, whenever I installed and enabled these 120 services, the server would run out of memory and CPU and need to be power cycled. I spent a few days trying out different configurations, but at the time I couldn’t avoid having too many node processes starting at once. So, instead I grouped the sites by platform and now run only 10 services.

To setup my cron-like system, I used these systemd features:

  • Services describe the work do. I have five recent import services and five full import services. The only real difference between each service is the command run in ExecStart.
  • Timers describe when to run the work. I have five recent import timers and five full import timers. The recent scrapers run every 15 minutes, while the full imports run every 12 to 24 hours.
  • Slices describe resource limits. I have one slice that limits the resources used by all my services as a whole.

shopify-recent.service

Notice that ExecStart uses the full path to node. WorkingDirectory launches the script in the same folder as my code.

# /etc/systemd/system/shopify-recent.service[Unit]
Description=Runs shopify recent scraper
Wants=shopify-recent.timer
[Service]
ExecStart=/usr/local/bin/node /app/shoes-scraper/src/scraper --recent --platform shopify
WorkingDirectory=/app/shoes-scraper
Slice=shoes-scraper.slice
[Install]
WantedBy=multi-user.target

shopify-recent.timer

This timer runs every 15 minutes + some random duration below 15 minutes.

# /etc/systemd/system/shopify-recent.timer[Unit]
Description=Run shopify-recent every 15-30 minutes
Requires=shopify-recent.service
[Timer]
Unit=shopify-recent.service
OnUnitInactiveSec=15m
RandomizedDelaySec=15m
AccuracySec=1s
[Install]
WantedBy=timers.target

shoes-scraper.slice

Because all my scrapers depend on shoes-scraper.slice , I can make sure my scrapers collectively never use more than 80% of the CPU or more than 2.7G of RAM on my server. Doing something like this with cron would be pretty tough.

# /etc/systemd/system/shoes-scraper.slice[Unit]
Description=Limited resources Slice
DefaultDependencies=no
Before=slices.target
[Slice]
CPUQuota=80%
MemoryLimit=2.7G

Installing the services

To install and start these services I run:

systemctl stop shopify-recent shopify-full ...systemctl daemon-reloadsystemctl enable shopify-recent.timer shopify-full.timer ...systemctl start shopify-recent shopify-full ...

If something went wrong when setting up a service, you can usually find it out by running systemctl status .

I don’t like running these commands every time I make a change, so I wrote a script to generate and install my systemd configuration.

Handy commands

systemctl start SERVICE
systemctl stop SERVICE
systemctl status SERVICE
systemctl list-timers # view the status of the timersjournalctl # view the full systemd logs in less
journalctl -u SERVICE # view the logs for a specific service
journalctl -f # tail the logs
journalctl -f -u SERVICE # tail the logs for a specific service

Conclusion

In conclusion systemd can be a nicer alternative to cron. Compared to cron, it can take a lot of time to configure systemd. But, systemd makes other things easier, like debugging errors, setting CPU and memory limits, and randomized scheduling.

I shared this blog on Reddit and got some pretty great feedback. Here’s some I’d like to highlight:

  • Myth: Email alerts are easier with cron. Actually, you can use OnFailure= with a systemd service to call the mail command.
  • You can use flock with cron to prevent running multiple instances of the same task.

Horrible Hacks

Some javascript and horrible stuff

Thanks to Dustin Moore

luqmaan

Written by

luqmaan

https://github.com/luqmaan https://twitter.com/luqmonster

Horrible Hacks

Some javascript and horrible stuff

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade