Automating Jupyter Notebooks with Papermill and Crontab

No Conversion to Python Scripts is Needed

Yixin Cao
Data Tyro
6 min readOct 15, 2023

--

Photo by Bench Accounting on Unsplash

Overview

This article explains how to schedule Jupyter Notebooks to automatically run at specified times in the future on Mac OS/Linux using Papermill and Crontab. This method provides automation directly on the Notebook without needing to convert to Python scripts.

Papermill is a Python package that allows you to parameterize and execute Jupyter Notebooks. Crontab is used to schedule scripts or commands to run at specific times. By combining Papermill, Crontab, and a simple bash script, you can automatically re-run and re-execute Notebooks on a defined schedule.

Prerequisites

To follow along with this article, you’ll need:

  • Python 3 and Jupyter Notebook installed on your Mac
  • Papermill Python package installed (pip install papermill)
  • Basic familiarity with the command line and cron jobs

The following examples were validated on MacOS Ventura 13.3.1 and Python 3.8.8 but should work on newer versions as well.

Step 1 — Create a Notebook to Parameterize

First, we’ll create a simple Jupyter Notebook to demonstrate passing parameters from the shell script. This Notebook will print a specified date and its corresponding timestamp converted to UTC time.

The Notebook will take in parameters for year, month, and day, which will be used to construct the date string to print. The next step will show how parameters can be passed in and used by the Notebook.

Example notebook with 3 cells:

from datetime import datetime, timezone
# input date

year = 2023
month = 10
day = 15
# use the parameters to do something

date_str = f"{year:d}-{month:02d}-{day:02d}"
print(f'date: {date_str}')
date = datetime.strptime(date_str, "%Y-%m-%d")
date_utc = date.replace(tzinfo=timezone.utc)
ts = int(date_utc.timestamp())
print(f'timestamp: {ts}')

When run with the sample values, the output would be:

date: 2023-10-15
timestamp: 1697328000

One important thing to note is that we need to specify the parameters in a dedicated cell and tag it as “parameters”. This tells papermill to interpret that cell as containing parameters it can pass values into.

The following snapshot shows how to tag a cell in 3 steps:

3 steps to tag a cell specifying parameters
  1. Select the cell to be tagged
  2. Click the settings button in the sidebar (make sure the sidebar is enabled in View menu)
  3. Add a “parameters” tag

This “parameters” tagged cell allows passing values when executing the notebook, without needing to import papermill.

Save this Notebook as print_date_ts.ipynb. We will execute this Notebook by passing in the year, month, and date parameters from the shell script.

Step 2 — Execute the Notebook with Parameters

Now we can execute the parameterized print_date_ts.ipynb notebook directly from the command line:

papermill print_date_ts.ipynb output.ipynb -p year 2023 -p month 10 -p day 16
  • output.ipynb refers to the output notebook that will be created by papermill. You can change the path and name as you like, and this output parameter is required.
  • The -p option passes parameters in a key=value format. We can also pass parameters as JSON, YAML, etc.

Since we tagged a cell as “parameters”, Papermill will parse those variables and pass the values provided on the command line.

This command will:

  • Run print_date_ts.ipynb, passing the date parameters
  • Create a new output.ipynb containing the execution results
  • Parse the “parameters” cell and assign the passed values
  • Print the constructed date string “2023–10–16” and the corresponding timestamp based on the parameters

There is also a new cell created and tagged as “injected-parameters” in output.ipynb:

find output.ipynb and check the execution results

This demonstrates executing the Notebook and passing parameters all from the shell without the need for a script.

Step 3 — Create a Bash Script to Encapsulate Execution

While we can execute the Notebook directly from shell, we can also create a bash script to encapsulate the execution command.

create a bash file and open it:

touch execute_test
vim execute_test

add a shebang line#!/bin/bash and the command line used in Step 2 into the bash script:

#!/bin/bash

papermill print_date_ts.ipynb output.ipynb -p year 2023 -p month 10 -p day 17

save and quit the bash file.

Then run the bash:

bash execute_test

We will get an updated output.ipynb and the input date is “2023–10–17”.

This puts the execution command into a reusable script for convenience.

One can also replace the first shebang line with:

#!/usr/bin/env bash

papermill print_date_ts.ipynb output.ipynb -p year 2023 -p month 10 -p day 17

This will use whatever bash executable appears first in the user's $PATH rather than hardcoding /bin/bash.

Step 4 — Schedule the Bash Script with Crontab

Next, we can use Crontab to schedule this bash script to run at specific times.

crontab -l # list all tasks
crontab -r # remove all tasks
crontab -e # edit tasks

To edit the crontab schedule, run:

crontab -e

Then add a new line like:

# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').
#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h dom mon dow command
0 12 * * * /path/to/execute_test

This will run the execute_test script every day at 12 PM. It’s important to make the path to bash script correct.

Or we can add the cron job via command line:

bash -c "crontab -r; echo '0 12 * * * /path/to/execute_test' | crontab -"

And check that it was added:

crontab -l

Some other crontab tips:

  • Use full paths for scripts, papermill, and notebook
  • Check permissions and logs for errors

When setting up the bash script that executes papermill, it is important to use the full system paths for both papermill and the notebook file. For example, instead of just specifying papermill notebook.ipynb output.ipynb, use the full path like /usr/local/bin/papermill /Users/name/notebooks/notebook.ipynb output.ipynb. This ensures that crontab can find the necessary programs and files when running the script non-interactively.

Here is an example bash script that uses full paths for papermill and the notebook:

#!/usr/bin/env bash

dir=path_to_notebook

papermill=path_to_papermill

cd $dir

$papermill print_date_ts.ipynb output.ipynb -p year 2023 -p month 10 -p day 18

You can use the which command to print the full path:

which papermill
/usr/local/bin/papermill

Crontab jobs execute without the full system path context, so hardcoding the full paths avoids issues like “command not found” errors. Adding the full paths for papermill, input notebook, and output notebook to the bash script makes sure the scheduled job will run smoothly.

With the script scheduled, the notebook will execute automatically per the crontab time specification.

Step 5 — Verify Automated Execution

Once the crontab schedule is set up, we can verify the notebook is automatically executing at the specified times:

  • Check that the crontab entry is listed with crontab -l
  • Look for output notebooks created at the scheduled times
  • Review logs from crontab runs for any errors
  • Confirm the output data matches expected results

Troubleshooting issues:

  • Check paths and permissions if execution fails
  • Add logging to the script for more visibility
  • Use crontab -e to edit the schedule if needed

With a valid schedule and script, the notebook should run like clockwork per the crontab configuration!

Summary

This covers the key steps to automate Jupyter Notebook execution on Mac using Papermill and crontab without converting to Python scripts:

  • Parameterize a notebook by tagging cells
  • Directly execute the notebook with Papermill
  • Create a bash script that specifies full paths
  • Schedule the script using crontab
  • Verify automated runs by checking outputs

Automating notebooks provides reproducibility and saves time over manual execution. With just a bit of setup, you can configure notebooks to run automatically on a defined schedule.

--

--