Coding standards for your Jupyter notebooks

Atma Mani
Atma Mani
Sep 15, 2018 · 5 min read

Jupyter notebooks have become incredibly popular amongst data scientists and general users of Python & R. While Jupyter framework is liberal and lets you be creative, it would benefit you and your team and readers if you define a structure and follow it. From my experience in developer evangelism and from authoring public facing notebooks for the last 3 years, below is my take on recommend patterns for writing data science samples using Jupyter Notebooks.

Use headings and markdown lavishly

Start your notebook with Heading level 1 and give it a title. Follow it with a narrative of what the notebook aims to do, where the data is sourced from and what the user can expect by the end of it.

Notebook above is a good example of a title and a narrative about the notebook

Break down your notebook into smaller parts and use Heading levels 2, 3, 4… for hierarchy of topics and sub-topics. A notebook should ideally have just one Heading level 1, under which multiple levels 2, 3… are nested.

An example of how headings render in notebooks

Insert a Table of Contents after your executive summary, so the reader can glance at your work without having to scroll a lot. You can auto insert ToC (and keep them up to date) with Jupyter notebook extensions.

Insert a table of contents for lengthy notebooks

Embed images. Use different typography (bold, italic, code) to highlight pieces of text

Master markdown to enrich your notebooks with rich typography and multimedia

Use LaTex for equations

Here is a cheat sheet with common LaTex symbols. Insert them inline within two-dollar signs $…$. Insert multi-line equations within $$…$$ (double dollar signs).

An example of LaTex equations get rendered in notebooks

Break longer segments of code into multiple cells

Try to keep your code cells as short as possible. Break them up by adding markdown cells in between and add explanatory text. A cell for a single line of code is too short, a cell with over 15 lines of code is too many.

Plot profusely

Matplotlib is great, but checkout higher level plotting libraries like Seaborn, `Pandas.DataFrame.plot()` before you settle for matplotlib. Use `plt.tight_layout()` to auto size your plots to fit the notebook.

Use subplots when you want to show a grid of plots. And finally, ensure your plots have legend, title, axes names and discernable symbols.

Coding standards for your Python snippets

snake_name_your_variables instead of camelCasingYourVariables and function names. (You use underscores to separate words, instead of cases). An exception is Class Names where you use CamelCasing and start with capital case. Writing Python quite a bit? Invest some time to look at https://pep8.org/. Your code reviewers and readers will love you.

Do all imports at the top of the notebook. This way reader knows what libraries are used and can ensure their environment is ready.

Name variables such that they don’t clobber built-ins. For instance, call your map object as `map1`, `map2` instead of `map` which will hide built-in `map()` function. Don’t call your variables as `dict` or `list` which will hide built-in data structures of same name.

Round numbers for display purposes. You can quickly round your DataFrames during display by calling the DataFrame_obj.round() method. For instance: `usa_house_scaled.describe().round(3)` will display your numeric columns in your DataFrame rounded to 3 decimal digits.

Be explicit about uncommon libraries that you use in the notebook

It is generally a good practice to import all your dependencies at the beginning of your script. However, in the notebook medium, you might prefer to import them as and when necessary, in order to explain your work better. This is especially true if you import a lot of dependencies at the function level. If you use a library that is not shipped with base anaconda, then the user has to run install steps and relaunch the notebook. Hence, make this explicit at the beginning of the notebook as shown below:

Explicitly state uncommon libraries that you use

Example structure of your data science notebook

By and large, structure your notebook as would a paper for a scientific journal.

Heading 1: Title: Cover the narrative / abstract. Include a ToC

Heading 2: Get data: Import libraries, search for and get required data sets.

Heading 2: Exploratory data analysis: Use maps and charts lavishly to show different aspects of the data.

Heading 2: Feature engineering: Use pandas and other libraries to prepare your data for training. After each significant transformation, show previews of your data by printing the first 3 or 5 records of your DataFrame

Heading 2: Analysis: Perform analysis, build and train models. Heading 3: Evaluation: Evaluate model. Bring out if assumptions are met using both charts and metrics. Run predictions, evaluate results using both charts and metrics. Use more than 1 metric for evaluation.

Heading 2: Act on the analysis: Persist the results, either by writing to disk or publishing them to the web. Elucidate with maps, charts as applicable. If you built a prediction model, publish it as a web tool (REST API). If you built an explanatory notebook, publish it as an article / report.

Heading 2: Conclusion: Summarize your work. Start from your problem statement, the approach you followed and the results you obtained.

Atma Mani

Written by

Atma Mani

Atma Mani is the lead product engineer for ArcGIS API for Python at Esri. He enjoys applying advanced analytics to solve spatial problems.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade