How I use Jupyter Notebooks
An assortment of settings that made my Jupyter Notebooks experience better. Feel free to send me some more.
Jump to Input Environment Tweaks to get to the meat.
A year ago, I never thought I‘d be studying in the building that stands opposite this pretty tree, writing a post on Jupyter Notebooks — the bread and butter of Data Science endeavors. On 4th March 2016, I got a positive decision from the Information School at the University of Washington, flew 7000 miles and here I am, in the spring of 2017, enrolled in a course titled “Data Science II: Machine Learning and Econometrics” taught by Prof. Ott Toomet. The course is described as,
Provides theoretical and practical introduction to modern techniques for the analysis of large-scale, heterogeneous data. Covers key concepts in inferential statistics, supervised and unsupervised machine learning, and network analysis. Students will learn functional, procedural, and statistical programming techniques for working with real-world data.
Exciting indeed! So I’ve been churning through half of Stack Overflow, and I thought I’d put down a few things that people might find interesting while working with Jupyter Notebooks. I was also motivated by this comment. So here goes.
The post is divided into three main sections:
— Input environment tweaks
— Output environment tweaks
Note: In the sections below, replace the *nix home
C:\Users\yourusername if you’re on Windows. If the directories don’t exist, please create them. I have tested these setting on a Mac with OSX Sierra and with an Anaconda setup on Windows.
Input environment tweaks
I found that using the
conda package manager to install and setup your environment is much easier. A good call would be installing Python 3.6 Miniconda and then setting up your work environment by installing your required packages. You might need to add
~/anaconda/bin before your system bin folder in
PATH to make sure
conda and other Anaconda binaries are prioritized over your system binaries.
Adding Jupyter Notebook extensions
There are various user-contributed extensions for Jupyter Notebooks that enhance the functionality a lot. You just need to run
conda install -c conda-forge jupyter_contrib_nbextensions and you should be able to see a Nbextensions tab after a restart. Now you can view help files, decide and add/remove extensions from inside Jupyter.
For example, I wanted to add custom keyboard shortcuts and Keyboard shortcut editor provides a really neat interface to customize things. There are ways to do this by editing the configuration, but this seemed to be much faster to me.
There are various other extensions such as “Gist-it” to make Gists from code blocks, “Move selected cells” to help you reorder cells, “Hide Input” to hide individual code cells you don’t want your users to see etc. Knock yourself out. :)
Increasing Cell Width
I’d prefer if Jupyter cells took the entire screen real estate because I can write and read long code lines more easily. Spot the difference below?
I’m loving the real estate upgrade! If you want to go crazy, check out Jupyter Themes, but I don’t want to go crazy at this time.
Enabling Soft Wrap
I think it’s better to have the code wrap rather than having to scroll back-and-forth horizontally within cells. Matthias Bussonnier’s comment (via Dan) fixed this for me. My configuration files are at the end in case you want to double check your settings.
Enabling Line Numbers
After enabling soft wrap, you’d want your cells to have line numbers because you want to know where your lines end. I know there are keyboard shortcuts for this but you can make it stick across sessions. Thanks, Nat Dunn, for pointing out how to do that here. The instructions are pretty straightforward. Check it out.
Output Environment Tweaks
Unhiding All Output
As shown in this Notebook (renders on desktop), Jupyter shows just the last variable that stands on its own line. mbh86 shows how to fix that in his answer. To make it constant across sessions, add
c.InteractiveShell.ast_node_interactivity = "all" to
Enabling Inline Matplotlib
If Jupyter doesn’t show your plots and shows something like
<matplotlib.axes._subplots.AxesSubplot at 0x128509810> instead, you probably haven’t enabled inline plotting. Adding
%matplotlib inline on top of your Notebook fixes this on a session-basis. Kyle Kelley provides a comprehensive answer on how to make this stick across sessions. Add
c.InteractiveShellApp.matplotlib = "inline" to
Viewing DataFrames in a GUI
I was trying to figure out if I can get better output that
df.tail() and possibly look at DataFrames in a GUI tool.
dfgui built by Fabian Keller worked best for me on OSX. See his answer on Stack Overflow and the
dfgui repository for installation instructions.
On OSX, I had to do
brew install wxpython before cloning and installing this tool. Windows install instruction are also mentioned on GitHub. Thanks to the author for accepting my pull request. :) Follow them and you should be good to go.
After you’re set, plug this into Jupyter to see your DataFrames in a neat little GUI.
If you really want to view it in Jupyter, be sure to check Andy Hayden’s answer that shows how to max out the columns that
pandas displays. I found this to be really useful. Consider editing
ipy_user_conf.py to make the setting stick and include your custom config in your results for reproducibility. ;)
Finally, the following tips are borrowed from here. Be sure to check out the original article as well for more amazing tips.
Suppressing Matplotlib Text
Once you have this setting in place, you might want to suppress the
<matplotlib.axes._subplots.AxesSubplot at 0x128509810> messages in your Notebook.
To suppress those messages, end your plot command with a semi-colon at the end (#16).
Seaborn is built over Matplotlib and makes building more attractive plots easier. Just by importing Seaborn, your matplotlib plots are made ‘prettier’ without any code modification. (#4)
Nothing to lose! So, just do
conda install seaborn, and
import seaborn as sns; sns.set() in your Notebook. We use
sns.set() to reset to default parameters.
Improving Plot Rendering — Retina Screens Only!
Who would’ve known, adding
c.InlineBackend.figure_format = "retina" to
ipython_config.py would render plots with double resolution? You should see visible differences on a retina screen. (#15)
These are on MacOS Sierra 10.12.4. YMMV!
Other ways to use Jupyter
Finally, I just want to mention some Jupyter alternatives that look really promising and would be worth trying out in the future. This is mostly derived from this Hacker News discussion.
— hydrogen. simply whoa! I have to try this out.
— Pycharm with a Jupyter setup? I just read you can preview DataFrames inside Pycharm, but for now, I prefer the lightweight
— atom-notebook. As the description says, “Jupyter Notebook, but inside Atom.”
I have linked all the Stack Overflow links in the individual points themselves, and other helpful links that I found are provided below:
- Pandas — comparison to SQL — http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html
- Matplotlib tutorial — http://www.labri.fr/perso/nrougier/teaching/matplotlib/
- Jupyter Gitter channel — https://gitter.im/jupyter/notebook
- 28 Jupyter Notebook tips, tricks and shortcuts — https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
If there are any correction, suggestions, comments or stale links — send me a note on twitter and I’ll look into it. Have some more tips to send? Send something below or tweet at me and I’ll update it here.
I’ll be using
pandas and other Python data science tools extensively over the rest of the quarter and I might write about my experiences with them. I look forward to learning more and mastering and applying concepts within this amazing ecosystem.
5/7/2017 — Added Windows Anaconda instructions.