Automating Tasks with Cron Jobs: A Beginner-Friendly Guide with Examples and Limitations

Huzaifa Zahoor
8 min readApr 2, 2023

--

Today, we’re going to talk about something called Cron Jobs. It might sound a little technical, but don’t worry, I'll explain it in a way that’s easy to understand.

Have you ever wished that you could automate certain tasks on your computer? Maybe you want to schedule a backup of your important files every week or run a script that updates your social media profiles at a specific time each day. This is where cron jobs come in handy.

So, what exactly is a Cron Job?

A cron job is a task that’s scheduled to run automatically at a specific time or interval. It’s a feature that’s built into most operating systems, including Linux and macOS. Cron jobs are used to automate repetitive tasks, such as backups, updates, and maintenance.

How do Cron Jobs work?

Cron jobs are created using a special syntax that tells the operating system when and how often to run the task. The syntax uses five fields, which represent the minute, hour, day of the month, month, and day of the week. For example, if you wanted to run a task every day3 AM, you would use the following syntax:

0 3 * * *

The first field represents the minute (in this case, 0), the second field represents the hour (3), and the remaining fields are set to * to indicate that the task should run every day, every month, and every day of the week.

Crontab.guru — The cron schedule expression editor

If you’re new to cron jobs and want to practice writing and testing your own cron job expressions, crontab.guru is a great tool to get started with. This free online service allows you to experiment with different scheduling patterns and see the corresponding output in real-time. You can also use crontab.guru to verify that your cron job syntax is correct and ensure that your scheduled tasks will run as expected. With its user-friendly interface and helpful tips and examples, crontab.guru makes it easy to master the art of cron job scheduling.

Examples of Cron Job

Run a job every minute:

* * * * * <command>

This cron job will run the specified command every minute of every hour of every day of the week.

Crontab.guru — The cron schedule expression editor

Run a job every hour:

0 * * * * <command>

This cron job will run the specified command at the beginning of every hour of every day of the week.

Crontab.guru — The cron schedule expression editor

Run a job every day at midnight:

0 0 * * * <command>

This cron job will run the specified command at midnight (00:00) every day of the week.

Crontab.guru — The cron schedule expression editor

Run a job every Monday at 9am:

0 9 * * 1 <command>

This cron job will run the specified command at 9 a.m. on every Monday of the week.

Crontab.guru — The cron schedule expression editor

Run a job every month on the 15th at 3pm:

0 15 15 * * <command>

This cron job will run the specified command at 3 p.m. on the 15th day of every month.

Crontab.guru — The cron schedule expression editor

Run a job from Monday to Friday, after every 10 minutes from 9 a.m. to 4 p.m.

*/10 9-16 * * Mon-Fri <command>

This cron job will run the specific command from Monday to Friday, every 10 minutes from 9 a.m. to 4 p.m. each week.

Crontab.guru — The cron schedule expression editor

Cron Job can help Python developers in what ways?

We’ll walk through how you connect Cron Jobs with Python and show you an example of how to automate the task of adding new historical data for AAPL from yfinance into an SQLite database. Here’s how you can do it:

Install yfinance and sqlite3 modules

First, you’ll need to install theyfinance and sqlite3 modules if you haven’t already. You can do this by running the following command in your terminal:

pip install yfinance sqlite3
  1. Create a new SQLite database

Next, you’ll need to create a new SQLite database to store the historical data. You can do this using the following code:

import sqlite3

# Connect to the database
conn = sqlite3.connect('aapl.db')

# Create a new table to store the data
conn.execute('''
CREATE TABLE IF NOT EXISTS aapl_history (
Date TEXT PRIMARY KEY,
Open REAL,
High REAL,
Low REAL,
Close REAL,
Adj_Close REAL,
Volume INTEGER
)
''')

# Commit the changes and close the connection
conn.commit()
conn.close()

This will create a new SQLite database called aapl.db and create a table called aapl_history to store the historical data.

Get historical data from yfinance and insert into the database

Now, you can use yfinance to get the historical data for AAPL and insert it into the database. You can do this using the following code:

import yfinance as yf
import sqlite3

# Connect to the database
conn = sqlite3.connect('aapl.db')

# Get the historical data for AAPL
aapl = yf.Ticker('AAPL')
history = aapl.history(period='1d')

# Insert the data into the database
cursor = conn.cursor()
cursor.execute('''
INSERT INTO aapl_history (Date, Open, High, Low, Close, Adj_Close, Volume)
VALUES (?, ?, ?, ?, ?, ?, ?)
''', (
str(history.index[0].date()),
float(history['Open'][0]),
float(history['High'][0]),
float(history['Low'][0]),
float(history['Close'][0]),
float(history['Adj Close'][0]),
int(history['Volume'][0])
))

# Commit the changes and close the connection
conn.commit()
conn.close()

This will get the historical data for AAPL for the current day and insert it into the aapl_history table in the database.

Automate the job using a Cron Job

Finally, you can automate this task by creating a cron job that runs the Python script at a specific time each day. To do this, you’ll need to open your terminal and enter the following command:

crontab -e

This will open the crontab editor. You can then add the following line to the file:

0 16 * * * /usr/bin/python3 /path/to/your/script.py

This will run the Python script every day at 4:00 PM (assuming your computer’s clock is set to your local time). You’ll need to replace “/path/to/your/script.py” with the actual path to your Python script.

Crontab.guru — The cron schedule expression editor

And that’s it! Now you have an automated job that will add new historical data for AAPL from yfinance into an SQLite database every day when the stock market closes.

Cron Jobs with Airflow & Celery

Cron jobs can be integrated with Airflow and Celery to take advantage of the advanced features and scalability provided by these tools.

In Airflow, cron jobs can be easily converted into DAGs, allowing users to define complex workflows with multiple tasks and dependencies. This can provide greater flexibility and control over scheduling and task execution while also enabling the use of advanced features like task retries, task dependencies, and dynamic task generation based on external data.

Similarly, Celery can be used in conjunction with a scheduler like Cron to distribute work across multiple worker nodes. This can help to increase the throughput of your workflows and ensure that tasks are completed in a timely and efficient manner.

Limitations of Cron Jobs

While cron jobs are a powerful tool for automating tasks, they do have some limitations. Here are a few potential limitations of cron jobs to keep in mind:

Limited scheduling options:

Cron jobs are limited to scheduling tasks based on specific time intervals or dates. For example, you can schedule a cron job to run every day at 3:00 AM to back up your website’s database. However, if you need to run a task based on a more complex schedule, such as on the last weekday of every month, cron jobs may not be sufficient.

Lack of task dependencies:

Cron Jobs do not provide built-in support for task dependencies, which can make it difficult to manage complex workflows with multiple interdependent tasks. For example, if you need to run a task to extract data from a database before running a task to process that data, you would need to manage these dependencies manually outside of the cron job.

Difficulty managing task outputs:

Cron jobs do not provide a built-in mechanism for managing task outputs or storing results. This can make it difficult to track the progress of tasks or to debug issues that arise during task execution. For example, if you are running a cron job to process log files, you may need to manually inspect the output to identify any errors or issues that arise.

Limited error handling:

Cron jobs provide only basic error handling, which may not be sufficient for more complex workflows. If a task fails, it may be difficult to determine the cause of the failure or to recover from it. For example, if a cron job fails to upload a file to a remote server, it may not provide enough information to troubleshoot the issue.

Limited scalability:

Cron jobs are typically run on a single server, which can limit their scalability. As the number of tasks or the complexity of workflows increases, it may become necessary to distribute tasks across multiple servers or to use more advanced workflow management tools like Airflow or Celery. For example, if you need to process large amounts of data on a regular basis, you may need to distribute the workload across multiple servers to keep up with the demand.

Incompatibility with Windows:

Cron jobs are a Linux/Unix-based tool and do not natively work on Windows operating systems. Windows does have its own task scheduling tool called Task Scheduler, but it operates differently from cron jobs and may require additional setup and configuration to achieve similar functionality. For example, if you need to schedule a task to run on a Windows server, you may need to use Task Scheduler or a third-party tool to accomplish this task instead of relying on cron jobs.

While cron jobs have some limitations, they are still a powerful tool for automating tasks and can be used effectively in many different contexts. By understanding the limitations of cron jobs and using them appropriately, you can build reliable and efficient workflows that help you be more productive and efficient.

Conclusion

In conclusion, cron jobs are an essential tool for automating tasks in a variety of settings. By scheduling tasks to run automatically on a regular basis, you can save time and increase efficiency. Moreover, integrating Cron Jobs with Python, Airflow, and Celery can further enhance their capabilities and provide more advanced workflow management options. However, it’s also important to be aware of the limitations of cron jobs, including their limited scheduling options, lack of task dependencies, difficulty managing task outputs, limited error handling, and scalability constraints. By keeping these limitations in mind and exploring more advanced workflow management tools as needed, you can make the most of cron jobs and achieve maximum efficiency in your work processes.

--

--

Huzaifa Zahoor

Huzaifa Zahoor is a Python developer and data engineer with over 4 years of experience in building web applications for the stock market.