Three Handy Jupyter Notebook Snippets

Dustin Michels
5 min readJul 18, 2019

Here are three Python snippets that I use in almost all my Jupyter Notebook projects.

1) Import code from outside notebooks directory

The Problem 🤔

I generally set up my Python projects set up like this:

myproject/
|-- README
|-- requirements.txt
|-- myproject/
|-- __init__.py
|-- main.py
|-- notebooks/
|-- example.ipynb

Source code goes in a directory with the project name, and notebooks go in their own, top-level directory. I most often use notebooks to document or experiment with code in the main project directory, and I think it’s neater to store the notebooks separately from the code itself.

If you try importing something from myproject into a notebook— either directly or with relative import syntax — you will fail.

Direct import fails: ModuleNotFound
Relative import fails: “attempted relative import beyond top-level package.”

The snippet 🎉

# Add directory above current directory to path
import sys; sys.path.insert(0, '..')
  • sys.path is a list of directories where Python will look for modules to import.
  • This statement adds the directory above your current directory to the list.

Now, just import the module as if you were in the parent directory.

import sys; sys.path.insert(0, '..')
from myproject.main import hello

2) Print lengths with commas

The problem 🤔

When working in a notebook, I frequently check the lengths of objects — a Pandas DataFrame I just loaded from file, a list I just filtered down, etc.

Sticking len(thing) statements all over the place is a great way to quickly gauge how much data you’re dealing with, compare objects to each other, or compare objects before/after some kind of transformation.

BUT, if you’re dealing with large numbers this is much less useful, since it’s hard to make sense of big numbers at a glance. Thus, in projects that deal with large objects, I’ve taken to defining a showlen function definition near the top of the notebook, to print lengths with commas. Then I just use showlen anywhere I would have used len.

The snippet 🎉

# define function to print object length with commas
showlen = lambda data: print(f"{len(data):,}")
  • Using a lambda instead of a regular def is not necessary here, it just makes the function real compact and portable.
  • The statement f"{len(data):,}" is an f-string, or formatted string literal, new in Python 3.6. Variable names or simple statements go between curly braces. You can place string formatting specifications after that.

3) Create quick progress bars with tqdm

The problem 🤔

For me, the great joy of working in a Jupyter Notebook is the ability to rapidly write code, run it, see the results, then revise your work, in a highly interactive process with instant feedback.

Until you try running something slow, and the whole process grinds to a halt:

The sad star of delayed satisfaction

Some things take time, but waiting is particularly frustrating if you don’t know how long a cell will take to finish evaluating.

Sticking a print(i) in that loop is a good start, but if you’re doing many iterations you’ll print out a whole lotta i s, overwhelming yourself and your notebook!

I experimented with printing then clearing the output, but eventually discovered the Python package tqdm which makes it so easy to create progress bars that now I use it in almost all my projects.

The snippet 🎉

The most basic usage is to just import tqdm, then wrap it around any iterable for an insta progress bar.

from tqdm import tqdm
for i in tqdm(range(10000)):
....

Looks like this:

Insta progress bar with tqdm

The output is very customizable, but the default display (in left-to-right order) includes:

  1. Percent completed
  2. Visual progress bar
  3. # iterations completed / total # iterations
  4. [time elapsed < estimated time remaining]
  5. Average number of seconds to complete one iteration. (If the iterations were faster, it would show iterations per second instead).

The bar works in notebooks as well as on the command line, but there is also an ipywidgets version of the progress bar specifically for the notebook environment.

To use, just install ipywidgets and use tqdm.auto instead of tqdm.( This will auto-detect your environment and use the widget if run in a notebook but plain text if run on the command line.)

Widget style tqdm progress bar
All done!

A few more things to note:

  • When iterating over a range, you can import then use trange(*args) instead of tqdm(range(*args). This is more convenient, and features some optimizations.
  • Beyond basic for loops, you can also use it in list comprehensions, eg, [x**2 for x in tqdm(my_lst)]. You can even use it with multiprocessing.Pool.imap.
  • If you’re iterating over a generator, tqdm won’t know the length and the progress bar will be more limited. But, if you know or can calculate the total length you can supply that as an argument. Eg,[x**2 for x in tqdm(my_gen, total=100000)]

Okay, there you go! Three real simple snippets that I’ve gotten a lot of mileage out of. Happy scripting!

--

--