How to share Python Code between Jupyter Notebooks

10 min readJun 29, 2024

Jupyter Notebooks have become an essential tool for data scientists, machine learning engineers, but also other developers who want to work interactively with Python code. It’s like having a constant debugging session where you have access to all your variables and outputs without having to run your script from the beginning every time and repeating startup processes, such as for example heavy data loading. However, working only with notebooks can be limiting and offer challenges as the project grows and there’s a need for shared code between multiple notebooks.

In this blog post, we will walk through how we can share code between different notebooks to create a scalable project structure. I will demonstrate a minimal setup with a very simple greeting function that we want to share between two notebooks. Let’s get started!

Project Setup

Version Control

I like to setup a new project by first initializing version control. Start by creating a folder for your project and creating a README.md and a .gitignore file. A good starting point for the gitignore is the official python.gitignore hosted on Github. For the readme I will usually start with the title of the repository and a small project description and then later add to it as the project grows. Then we can initialize the local repository and create our first commit, as well as define the main branch.

git init
git add -A
git commit -m "initial commit"
git branch -M main

I will host this repository on Github and therefore initialize the remote repository there. Make sure to create an empty repository by not selecting any of the options to create a Readme, License or gitignore file, as we already did this locally.

NOTE: We could have also initialized the repository on Github by selecting the create README option and selecting Python for the gitignore. However to keep this tutorial git hosting agnostic, I show you show to set up the project locally and then you can set the remote to any Git provider you want. In that case, you can skip this section about setting up Github.

You can now copy the remote repository url from the Quick Setup box.

Now we can add the remote repository to our remote origin in our local repository and push the first commit to the remote main branch.

git remote add origin [YOUR_REPOSITORY_URL]
git push -u origin main

Virtual Environment

To isolate our project and the dependencies we might install, it is best to use a virtual environment. My go-to virtual environment manager for python projects is MiniConda. There are generally two places to create your virtual environmetn in. You can either create a global environment that is accessible by other projects as well by specifying a name for the environment or you can create a local environment for this project by specifying a local path inside the project with the prefix attribute. I prefer the second approach, as it doesn’t clutter my global conda environments with each project.

conda create --prefix ./env python=3.12
conda activate ./env

This will create an env folder at the project root, containing all the packages we will install. Note that is folder is already excluded by the default Python gitignore, we don’t want all the package files to be tracked. What we can do later is create an environment/requirements file that described our environment setup and this can then be reproduced on other machines that clone the repository.

One thing I want to address here is that the conda environment is displayed as a full path before our command prompt.

This is a bit annoying and can be fixed by editing the env_prompt property in the .condarc file. Run the following command to set the property to only display the name of the environment instead of the full absolute path.

conda config --set env_prompt '({name})'

This will add the following line to the ~/.condarc file.

env_prompt: ({name})

Now you can restart your shell and reactivate your environment. You should now only see the env prefix without the full absolute path.

Visual Studio Code

I am going to work with Visual Studio Code as my code editor and show you a setup step to select the previously created Conda environment as the default Python interpreter for this project. Open the Command Palette with Ctrl+Shift+P, select Python: Select Interpreter and then look for the locally created environment.

This will make sure that the selected environment is activated in newly opened terminals, as well as adding Intellisense to Python code that uses packages installed in that environment.

First Notebook

Now we can finally get started with some code! Create a directory notebooks at the project root and add a first Jupyter Notebook file notebook_1.ipynb. I will create a simple function that prints a greeting with a string as a parameter.

def say_hello(source: str) -> None:
    print(f"Hello, world from {source}!")

Then in a new notebook cell, we can call the function passing the string “First notebook” as a greeting source.

say_hello(source="First notebook")

Now in VS Code we can click on the Run All button to run the two cells in the notebook.

You will then be prompted to select a kernel if you haven’t selected one yet. We will have to do this once for every notebook we create. Select the previously created Conda environment from the list by selecting Python Environments… — env.

You will then also be prompted to install the ipykernel package, which is required to run notebooks using the environment. Click on Install to install the package into the virtual environment.

Now you should see the output of the second cell where the function is called. Also note on the top right you can see the currently selected kernel, which is our conda environment. If you ever want to change the runtime you can click there and select a different Python interpreter.

Second Notebook

Now let’s say you want to create a second notebook, that sends a different greeting. You could just copy the say_hello function over to the second notebook. In this simple example this might look not too bad but imagine as your project grows more and more notebooks are created and when you want to change something in the function you have to remember all the places you have copied it to. A better approach is to create a Python package where this shared Python code lives and we can import them in our notebooks.

To achieve this, create a src folder at the project root and add a new Python file, let’s call it greeting.py. Copy the say_hello function to this newly created file.

Now we want to import this function in our notebook. If we try this naively, e.g. by importing from greeting, from src.greeting or even using relative imports from ..src.greeting you will see that this does not work out of the box, even though for the last two we get Intellisense in VS Code.

This is because the notebook is executed in the notebooks directory, while the Python file lives in the src folder, with a relative path of ../src/greeting.py. So actually the last approach would be the correct “path”, however relative imports are not the right tool in this case.

Another way would be to add the src folder to the system path. This works, however it is quite ugly and you might want to separate your notebooks into different sub-folders and then you don’t want to manually adjust all the imports every time you move a notebook to a different directory.

A way better method is to create a local package from the src files and install it with pip in editable mode. That way changes to the Python files will instantly reflect in the notebooks code and you don’t have to reinstall the package every time you change something in the shared code.

The modern way to setup a pip installable package is to create a pyproject.toml file at the project root. I will use the hatchling build backend for this project. The minimal required attributes for this are a name for the project and a version. And finally we have to specify the packages in our project. This corresponds to the src folder we created earlier and has to match that folder name exactly.

NOTE: the name of the project can be different than the name of our package, and a single project can contain multiple packages. The package name is the name you will use to import the code in your scripts/notebooks, while the project name is the name used to install the pip package.

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "notebook-template"
version = "1.0.0"

[tool.hatch.build.targets.wheel]
packages = ["src"]

Now we can install the package in editable mode using pip. This will allow us to import the src package independent of the notebook path while referencing the original source code files instead of creating a copy of them in the site-packages of our environment at the time of installation.

pip install -e .

After restarting your notebook kernel to reload the packages, you should be able to import the say_hello function.

Now we can finally create a second notebook notebook_2.ipynb and import the function there as well, sending a different greeting.

Autoreload

Now let’s take advantage of having the greeting code shared modify the message in the greeting.py file.

def say_hello(source: str) -> None:
    print(f"Hello World! Sent from {source}!")

Now let’s run the notebook again without restarting the kernel and …. wait. What? The output didn’t change, it still shows the old message! Even when we execute the cell with the import again, as long as we don’t restart the kernel our code changes are not reflected in the notebook!

This can be really dangerous, as you always have to remember to restart your kernel when you do changes in your shared code. Furthermore, this removes the advantage of the notebook kernel keeping the session open when you do code changes in the shared code.

To alleviate this, we can use the autoreload extension for notebooks. Add a cell before the import of the shared code that loads the extension and configures to reload all the modules when executing any cell.

%load_ext autoreload
%autoreload 2

Now you have to restart the kernel because we imported the greeting before setting the autoreload extension. However, after this you won’t have to restart the kernel anymore when changing anything in the shared source code.

Now adding these two lines to every new notebook is also not really nice. In VS Code we can instead configure these to run automatically as startup commands before a jupter notebook is executed by setting the jupyter.runStartupCommands setting in the settings.json.

"jupyter.runStartupCommands": [
    "%load_ext autoreload",
    "%autoreload 2"
]

This setting can only be specified globally in your user settings.json. Open the user settings json with the Command Palette Ctrl+Shift+P and then selecting Open User Settigns (JSON).

{
    ...
    "jupyter.runStartupCommands": [
        "%load_ext autoreload",
        "%autoreload 2"
    ]
}

Now you can remove the cell with the autoreload configuration and your changes to the source code will still be reflected immediately in your notebook code.

Conclusion

In this post I demonstrated how you can setup a project with Jupyter notebooks that share common python code. This allows us to extract common logic of multiple notebooks without duplicating code and therefore enabling cleaner projects. I hope this article helps you in improving your project setup when working with Jupyter Notebooks. Check out the project on my Github below for the full source code. Happy coding!

GitHub - trflorian/notebook-template: Template project for shared python code between notebooks

Template project for shared python code between notebooks - trflorian/notebook-template

github.com