5 Jupyter Notebook Tricks I Wish I Knew Earlier
Jupyter notebook, previously known as IPython notebook, is one of the most popular IDEs for data science projects. You can put all the codes, visualisations, notes, images, or comments all together to enhance readability and communication. Following are some tricks I found pretty useful and wish I’d known earlier after working on a number of data science/analysis projects.
1. Notebook width adjustment
When you open a notebook, it doesn’t come full width as default. It will only utilise around 50% of the screen. This could put some restrictions when we want to see a long line code at once or to enlarge the inline visualisations.
If you’re one of those who don’t like to change the default configuration, running the following code in a cell can dynamically adjust your notebook only on the notebook you’re currently working on.
from IPython.core.display import display, HTML
display(HTML(“<style>.container { width:80% !important;}</style>”))
Adjusting the width within the notebook instead of changing the default configuration makes it more flexible and portable. In addition, it lowers the risk of messing up the configuration, therefore the whole environment, if you’re not familiar with it.
2. Running shell commands
By adding an exclamation mark (!) you can run shell commands in jupyter notebook’s cells. Some simple examples would be using !pwd
to check the current working directory or !ls
to list the files in the current working folder.
This trick is extremely useful when it comes to package installations. Say you’re working on a notebook, you’re importing some packages required for your work, and you get a ‘No Module’ error. Normally you would open a terminal to install the packages. With this trick, you can easily install packages without leaving the notebook.
However, that being said, it may not be as easy as !pip install <package name>
. A better practice would be the following:
import sys
!{sys.executable} -m pip install <package name>
Or if conda:
import sys
!conda install --yes --prefix {sys.prefix} <package name>
This way you can prevent packages from being installed in the wrong directories.
3. I’ve installed the package but can’t import it in the notebook!
Following on the installation topic, this is probably one of the most common issues a jupyter notebook user may have encountered. Many finds that they can import the packages from the terminal, however not in the notebook. The root cause of this problem is the disconnection between jupyter shell environment and jupyter kernel. To solve this issue, we need to make sure the jupyter kernel is consistant with the python environment, so the notebook is searching modules from the right directory.
If you run python -m pip install
in the terminal, the package is usually installed at /usr/local/lib/python{version}/site-packages
, or path/to/the/virtualenv/lib/python{version}/site-packages
if using a virtual environment. You can check this by running pip list -v
. However, your notebook is probably not looking for packages from here. Running the following line to find out what environment the notebook is using.
import syssys.path
This will return a list of all directories the interpreter will use to look for modules while importing. You can either add an target
arguement in pip install like
python3 -m pip install <package_name> --target {directory in sys.path}
(run in terminal, or follow the previous trick to install in the notebook), which is preferrable, or change the system path to the directory containing the packages.
This notebook gives a holistic explanation of why is installation from Jupyter notebook so messy.
4. IPython magic functions
Magic functions are enhancements that IPython adds in addition to Python syntax. They are prefixed by the %
character. There are 2 kinds of magic functions: line magics, which are prefixed by a single %
. They operate on only one single line of input. Cell magics, which are prefixed by a double %%
. They operate on multiple lines of input. These magic functions are designed to solve some of the most common data analysis problems. Following are 3 of them that I found most useful.
%store
This function is handy when you need to pass variables across notebooks. For example, you have a data analysis work that involves 3 datasets. All of them need to be cleaned and transformed, and maybe some extent of exploration. And you have to join them all together to build a machine learning model or just simply to explore how the variables interact with each other. You don’t want to put them all in one notebook for the sake of tidiness and clearer communication/presentation. You can run the following to store the variable, a processed dataframe in this case.
%store <var1> <var2> <var3> ......
And in other notebooks, run %store -r
to call all variables or specify only the variables you need by running %store -r <var1> <var2>……
. More usage of this function:
%store : Show a list of all variables and their current value
%store -d <var1> : Remove variable from the storage
%store -z : Remove all variables from the storage
**Careful! Calling %store
will overwrite your current variables if they have the same name.
%%writefile
Output the current cell to a new file, overwrite if file exists. I personally like to use this function to output customised functions to another .py
file and then import from there. This way can make my notebook cleaner and more precise instead of letting the consumers seeing loads of unrelated codes (if the main focus of this notebook is the analysis result, which is usually my case). You can run it in your notebook first to make sure it works.
%%writefile <filename.py>def customised_function():
...Your function...
...Your function...
retunr <output>from filename import customised_funtion
It also makes the function more reproducible too. If you want to append some codes from another cell, you can add -a
or --append
after %%writefile
to prevent overwriting.
%pycat
As you can probably guess, this function is similar to cat
in Linux, which allows you to view the content of a text file. This is a saver if you forgot how you wrote a function or what this file is doing as often as I do. Simply run %pycat filename
to inspect the content of the file. This function will assume you’re opening a python file and show the file with syntax highlighting. It can also open an URL too. %pycat https://example.com/script.py
A comprehensive list of magic functions can be found here.
5. Show more rows/columns
This is actually not a jupyter notebook trick, but a pandas trick. When you’re calling a dataframe, pandas only shows you part of the table if the number of rows or columns is greater than a certain value. It won’t automatically enable the scrolling so you can see the whole table. By running the following you can configure how many rows/columns you’d like to show.
pd.set_option('display.max_rows', 500,'display.max_columns', none)
set_option
allows you to set the value of some specific options. You first declare what option you’d like to set, then followed by the value. Setting it to none
allows you to see the whole table. You can find the whole list of options you can configure by using this function here.
Jupyter notebook is a brilliant IDE for data analysis or data science projects, for communication, for presentation. There are so many additions which can streamline our workflow and make our life earlier. Knowing the in and out of jupyter is a long journey, and I still can’t see the finish line. I’m still very thrilled whenever finding a new tip or shortcuts that would make my working experience better.
Comment and share what is your favorite Jupyter Notebook trick!