Jupyter + IDE: how to make it work
This post describes a workflow to separate writing code in an IDE from using that code for analysis in a Jupyter notebook, while actually making using notebooks faster and cleaner
Introduction
If you work with data in Python, you probably know of Jupyter notebooks. The format just makes sense for simultaneously writing code and exploring data ..or does it?!
Most of critique about relying on notebooks can be found in this excellent talk: I don’t like notebooks, Joel Grus. To summarize, notebooks are not a great environment for a large project: git versioning doesn’t really work, tests are not supported, ultimately notebooks become a mess of code scattered across cells. Of course, all of that falls flat against a simple argument:
“But.. notebooks are so convenient” (me a year ago probably)
When working with data, you almost never know if your approach works or not just because your code runs. You always do analysis during and after developing a model, track results of different hypothesis, record observations, etc. Notebook format really does align with that process. But that doesn’t mean it can’t be improved.
Pros & Cons
Let’s summarize exactly why we love and hate Jupyter notebooks and IDEs (Integrated Development Environment).
Finally, let’s see how to marry the best of both worlds in one simple workflow.
Workflow setup
- (Optional) Use pyenv for isolating your python for each project space. Installation guide.
Virtual environments are a different topic, but I think they’re also super helpful.
- Editable pip install
We use pip to essentially point python to a ‘package’ that will be our code. It’s not a real package though, rather just a link to our code. This allows us to import our code as if it was a python package, while also being able to edit our code and updating the ‘package’ for it on the fly.
cd YOUR_PROJECT_ROOT
mkdir lib
Define the package by creating asetup.py
file in your project root:
from setuptools import setup
setup(
name='lib',
version='0.1.0',
packages=['lib'],
)
Install the package with pip’s editable flag:
pip install -e .
- Package structure
We still need our code to have a package structure. Every folder in lib
directory needs to have an empty __init__.py file.
|-- setup.py
|-- lib
| |-- __init__.py
| |-- tools
| |-- __init__.py
| |-- tool.py
As an example, in lib/tools/tool.py
define a function:
def yo():
print("Hello World")
You can now import it in a python shell with:
>>> from lib.tools.tool import yo
>>> yo()
Hello World
- Autoreload
Jupyter can dynamically reload any package changes with autoreload.
In your Jupyter notebook, run autoreload import once:
%load_ext autoreload
%autoreload 2
Result
Now when we edit code inlib
using our favourite IDE, we will have:
- python automatically put the edits into a
lib
package, - Jupyter notebook automatically propagate the changes whenever we use that package.
The changes propagate from an IDE to notebook immediately, as soon as you save the file you’re working on! Note, I did not need to rerun the function import to see the changes.
How I actually use this workflow:
It allows me to keep my notebooks to the point, documenting the results I want to keep, while keeping good quality code in a proper repository that can be reused later, without a need to copy paste chunks of code from notebook to notebook.
It also allows me to use a powerful IDE to develop quality code, making me waste less time on documenting code, syntax errors, etc.