Jupyter Notebook Tutorial: The Definitive Guide

Karlijn Willems
DataCamp
Published in
11 min readNov 16, 2016

Originally published at https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook

Data science is about learning by doing. One of the ways you can learn how to do data science is by building your own portfolio: elaborating your own pet project, doing a quick data exploration task, participating in a data challenge, reporting on your research or advancements you have made in learning data science, creating an Extract, Transform, and Load (ETL) flow of data, …

This way, you exercise the practical skills you will need when you work as a data scientist.

As a web application in which you can create and share documents that contain live code, equations, visualizations as well as text, the Jupyter Notebook is one of the ideal tools to help you to gain the data science skills you need.

This tutorial will cover the following topics:

The Jupyter Notebook: an interactive data science environment

What Is A Jupyter Notebook?

In this case, “notebook” or “notebook documents” denote documents that contain both code and rich text elements, such as figures, links, equations, … Because of the mix of code and text elements, these documents are the ideal place to bring together an analysis description and its results as well as they can be executed perform the data analysis in real time.

These documents are produced by the Jupyter Notebook App.

We’ll talk about this in a bit.

For now, you should just know that “Jupyter” is a loose acronym meaning Julia, Python, and R. These programming languages were the first target languages of the Jupyter application, but nowadays, the notebook technology also supports many other languages.

And there you have it: the Jupyter Notebook.

As you just saw, the main components of the whole environment are, on the one hand, the notebooks themselves and the application. On the other hand, you also have a notebook kernel and a notebook dashboard.

Let’s look at these components in more detail.

What Is The Jupyter Notebook App?

As a server-client application, the Jupyter Notebook App allows you to edit and run your notebooks via a web browser. The application can be executed on a PC without Internet access or it can be installed on a remote server, where you can access it through the Internet.

Its two main components are the kernels and a dashboard.

A kernel is a program that runs and introspects the user’s code. The Jupyter Notebook App has a kernel for Python code, but there are also kernels available for other programming languages.

The dashboard of the application not only shows you the notebook documents that you have made and can reopen but can also be used to manage the kernels: you can which ones are running and shut them down if necessary.

The History of IPython and Jupyter Notebooks

To fully understand what the Jupyter Notebook is and what functionality it has to offer you need to know how it originated.

Let’s back up briefly to the late 1980s. Guido Van Rossum begins to work on Python at the National Research Institute for Mathematics and Computer Science in the Netherlands.

Wait, maybe that’s too far.

Let’s go to late 2001, twenty years later. Fernando Pérez starts developing IPython.

In 2005, both Robert Kern and Fernando Pérez attempted building a notebook system. Unfortunately, the prototype had never become fully usable.

Fast forward two years: the IPython team had kept on working, and in 2007, they formulated another attempt at implementing a notebook-type system. By October 2010, there was a prototype of a web notebook and in the summer of 2011, this prototype was incorporated and it was released with 0.12 on December 21, 2011. In subsequent years, the team got awards, such as the Advancement of Free Software for Fernando Pérez on 23 of March 2013 and the Jolt Productivity Award, and funding from the Alfred P. Sloan Foundations, among others.

Lastly, in 2014, Project Jupyter started as a spin-off project from IPython. IPython is now the name of the Python backend, which is also known as the kernel. Recently, the next generation of Jupyter Notebooks has been introduced to the community. It’s called JupyterLab. Read more about it here.

After all this, you might wonder where this idea of notebooks originated or how it came about to the creators. Go here to find out more.

How To Install Jupyter Notebook

Running Jupyter Notebooks With The Anaconda Python Distribution

One of the requirements here is Python, either Python 3.3 or greater or Python 2.7. The general recommendation is that you use the Anaconda distribution to install both Python and the notebook application.

The advantage of Anaconda is that you have access to over 720 packages that can easily be installed with Anaconda’s conda, a package, dependency, and environment manager. You can download and follow the instructions for the installation of Anaconda here.

Is something not clear? You can always read up on the Jupyter installation instructions here.

Running Jupyter Notebook The Pythonic Way: Pip

If you don’t want to install Anaconda, you just have to make sure that you have the latest version of pip. If you have installed Python, you will normally already have it.

What you do need to do is upgrading pip and once you have pip, you can get started on installing Jupyter.

Go to the original article for the commands to install Jupyter via pip.

Running Jupyter Notebooks in Docker Containers

Docker is an excellent platform to run software in containers. These containers are self-contained and isolated processes.

This sounds a bit like a virtual machine, right?

Not really. Go here to read an explanation on why they are different, complete with a fantastic house metaphor.

Running Jupyter in Docker Containers

You can easily get started with Docker: turn to the original article to get started with Jupyter on Docker.

How To Use Jupyter Notebooks

Now that you know what you’ll be working with and you have installed it, it’s time to get started for real!

Getting Started With Jupyter Notebooks

Run the following command to open up the application:

jupyter notebook

Then you’ll see the application opening in the web browser on the following address: http://localhost:8888.

For a complete overview of all the components of the Jupyter Notebook, complete with gifs, go to the original article.

If you want to start on your notebook, go back to the main menu and click the “Python 3” option in the “Notebook” category.

You will immediately see the notebook name, a menu bar, a toolbar and an empty code cell.

You can immediately start with importing the necessary libraries for your code. This is one of the best practices that we will discuss in more detail later on.

After, you can add, remove or edit the cells according to your needs. And don’t forget to insert explanatory text or titles and subtitles to clarify your code! That’s what makes a notebook a notebook in the end.

For more tips, go here.

Are you not sure what a whole notebook looks like? Hop over to the last section to discover the best ones out there!

Toggling Between Python 2 and 3 in Jupyter Notebooks

Up until now, working with notebooks has been quite straightforward.

But what if you don’t just want to use Python 3 or 2? What if you want to change between the two?

Luckily, the kernels can solve this problem for you! You can easily create a new conda environment to use different notebook kernels.

Then you restart the application and the two kernels should be available to you. Very important: don’t forget to (de)activate the kernel you (don’t) need. Go to the original article to see how this works and how you can manually register your kernels.

Running R in Your Jupyter Notebook

As the explanation of the kernels in the first section already suggested, you can also run other languages besides Python in your notebook!

If you want to use R with Jupyter Notebooks but without running it inside a Docker container, you can run the following command to install the R essentials in your current environment. These “essentials” include the packages dplyr, shiny, ggplot2, tidyr, caret and nnet. If you don't want to install the essentials in your current environment, you can use the following command to create a new environment just for the R essentials.

Next, open up the notebook application to start working with R with the usual command.

If you want to know about the commands to execute or extra tips to run R successfully in your Jupyter Notebook, go here.

If you now want to install additional R packages to elaborate your data science project, you can either build a Conda R package or you can install the package from inside of R via install.packages or devtools::install_github (from GitHub). You just have to make sure to add new package to the correct R library used by Jupyter.

Note that you can also install the IRKernel, a kernel for R, to work with R in your notebook. You can follow the installation instructions here.

Note that you also have kernels to run languages such as Julia, SAS, … in your notebook. Go here for a complete list of the kernels that are available. This list also contains links to the respective pages that have installation instructions to get you started.

Making your Jupter Notebook Magical With Magic Commands

Making Your Jupyter Notebook Magical

If you want to get the most out of this, you should consider learning about the so-called “magic commands”. Also, consider adding even more interactivity to your notebook so that it becomes an interactive dashboard to others should be one of your considerations!

The Notebook’s Built-In Commands

There are some predefined ‘magic functions’ that will make your work a lot more interactive.

To see which magic commands you have available in your interpreter, you can simply run the following:

%lsmagic

And you’ll see a whole bunch of them appearing. You’ll probably see some magics commands that you’ll grasp, such as %save, %clear or %debug, but others will be less straightforward.

If you’re looking for more information on the magics commands or on functions, you can always use the ?.

Note that there is a difference between using % and && . To know more about this and other useful magic commands that you can use, go here.

You can also use magics to mix languages in your notebook without setting up extra kernels: there is rmagics to run R code, SQL for RDBMS or Relational Database Management System access and cythonmagic for interactive work with cython,... But there is so much more here!

Interactive Notebooks As Dashboards: Widgets

The magic commands already do a lot to make your workflow with notebooks agreeable, but you can also take additional steps to make your notebook an interactive place for others by adding widgets to it!

This example was taken from a wonderful tutorial on building interactive dashboards in Jupyter, which you can find on this page.

Share Your Jupyter Notebooks

In practice, you might want to share your notebooks with colleagues or friends to show them what you have been up to or as a data science portfolio for future employers. However, the notebook documents are JSON documents that contain text, source code, rich media output, and metadata. Each segment of the document is stored in a cell.

Ideally, you don’t want to go around and share JSON files.

That’s why you want to find and use other ways to share your notebook documents with others.

When you create a notebook, you will see a button in the menu bar that says “File”. When you click this, you see that Jupyter gives you the option to download your notebook as an HTML, PDF, Markdown or reStructuredText, or a Python script or a Notebook file.

You can use the nbconvert command to convert your notebook document file to another static format, such as HTML, PDF, LaTex, Markdown, reStructuredText, ... But don't forget to import nbconvert first if you don't have it yet!

Then, you can give in something like the following command to convert your notebooks:

jupyter nbconvert --to html Untitled4.ipynb

With nbconvert, you can make sure that you can calculate an entire notebook non-interactively, saving it in place or to a variety of other formats. The fact that you can do this makes notebooks a powerful tool for ETL and for reporting. For reporting, you just make sure to schedule a run of the notebook every so many days, weeks or months; For an ETL pipeline, you can make use of the magic commands in your notebook in combination with some type of scheduling.

Besides these options, you could also consider the following options.

Jupyter Notebooks in Practice

This all is very interesting when you’re working alone on a data science project. But most times, you’re not alone. You might have some friends look at your code or you’ll need your colleagues to contribute to your notebook.

How should you actually use these notebooks in practice when you’re working in a team?

The following tips will help you to effectively and efficiently use notebooks on your data science project.

Tips To Effectively and Efficiently Use Your Jupyter Notebooks

Using these notebooks doesn’t mean that you don’t need to follow the coding practices that you would usually apply.

You probably already know the drill, but these principles include the following:

  • Try to provide comments and documentation to your code. They might be a great help to others!
  • Also consider a consistent naming scheme, code grouping, limit your line length, …
  • Don’t be afraid to refactor when or if necessary

In addition to these general best practices for programming, you could also consider the following tips to make your notebooks the best source for other users to learn:

  • Don’t forget to name your notebook documents!
  • Try to keep the cells of your notebook simple: don’t exceed the width of your cell and make sure that you don’t put too many related functions in one cell.
  • If possible, import your packages in the first code cell of your notebook, and
  • [More tips here]

Jupyter Notebooks for Data Science Teams: Best Practices

Jonathan Whitmore wrote in his article some practices for using notebooks for data science and specifically addresses the fact that working with the notebook on data science problems in a team can prove to be quite a challenge.

That is why Jonathan suggests some best practices:

  • Use two types of notebooks for a data science project, namely, a lab notebook and a deliverable notebook. The difference between the two (besides the obvious that you can infer from the names that are given to the notebooks) is the fact that individuals control the lab notebook, while the deliverable notebook is controlled by the whole data science team,
  • Use some type of versioning control (Git, Github, …). Don’t forget to commit also the HTML file if your version control system lacks rendering capabilities, and
  • Use explicit rules on the naming of your documents.

Learn From The Best Notebooks

This section is meant to give you a short list with some of the best notebooks that are out there so that you can get started on learning from these examples.

You will find that many people regularly compose and have composed lists with interesting notebooks. Don’t miss this gallery of interesting IPython notebooks or this KD Nuggets article.

Originally published at www.datacamp.com.

--

--