My First Pull Request for the Open Source PixieDust Project

Setting up Python test environments, collaborating on GitHub, and sharing my changes with the world

--

During a decade in academia I have benefitted from a host of open source software projects while researching climate and environmental data. But I have never contributed to any of the open source packages that make my life so much easier. Until now! I used a recent PixieDust sprint in our developer advocacy team to change that. During this week I set myself the goal to learn how to fix at least one bug in the project.

A while ago I wrote issue #507 in the main PixieDust repository as I noticed there was something wrong with displaying the date in a line chart. Here I document how I went about making my first contribution to the project.

You won’t be starting with a blank page. There’s plenty of existing code to work on. We just have to set things up. (Photo by Dariusz Sankowski on Unsplash)

PixieDust

PixieDust is an open source library that is an add-on to Jupyter notebooks to improve the user experience of working with data. All PixieDust code is on GitHub. This means I can fork the code, make changes, and then install my own local version to use with my notebooks. But it would be much better to make sure that my changes end up in the master repository on GitHub, so that others can use my improvements as well. You can do this with a pull request, where you request your code to be merged with the master repository.

When you are new to GitHub, a good starting point is this tutorial that explains how to create a repository, work with branches, and commit changes through pull requests.

Set up a local test environment

Before running any code it is a good idea to set up a local test environment that is separate from other projects. This means that when you break anything all your other projects are safe.

To test PixieDust you need a Python environment. If you are already using PixieDust locally, Python should be installed. If not, the easiest way is to use Anaconda, which contains the most common packages.

Part of this installation is the conda command, with which you can easily create new environments. Below I created a new environment with Python 3.6 that I want to use for testing my code.

And then activate this environment with:

Jupyter notebooks use kernels to run code. To create a new kernel with this new py36-dev environment you have two options. Use conda or python directly:

Or use the PixieDust installer if you want to create a new Spark kernel:

And go through the steps answering questions such as where to install Spark and Scala and what name you want to use for the new kernel. Make sure you have activated the new py36-dev environment before creating the new kernels!

I have done my testing with a kernel installed with PixieDust and the py36-dev environment activated. But to be really thorough, it is best to test your code with different versions of Spark and Python. This you can do by installing multiple kernels and then switching between them in your notebooks.

Get your own local PixieDust

To move the code from GitHub to your local machine you can use GitHub Desktop or the command line. I find it easiest to use the desktop version, so this is what I will describe here.

If you want to learn more about Git on the command line, check out these docs or get Lorna Mitchell’s book on Git!

Make sure you have a GitHub account, then start by forking PixieDust by clicking the Fork button at the top right of the repository on GitHub. This will copy all the code into your own repository in which you can make all your code changes. To copy the code to your local machine go to GitHub Desktop, sign in, and look for your own pixiedust branch under the + button → Clone in the left upper corner. Find a location, and all the code will be downloaded.

To make sure your code stays up to date with the original code you should click the button Update from master regularly. Also make regular commits while you are woking on the code, so you will always have a backup in case there is a problem with your local machine.

Run your local PixieDust

The local version of PixieDust can now be installed from the notebook. Open a terminal and activate the new Python test environment and then start the notebook environment:

Create a new notebook with the new kernel and run the following from a cell if you want to install PixieDust from your local machine:

Or the below if you want to install from GitHub (change the url to the location of your repository):

Now you can make changes to the code and test them by running it, restarting the kernel and then importing PixieDust.

Make code changes to PixieDust

The first step is to move the issue that you are working on (#507 in my case) from the New Issues or Backlog list to the In Progress list.

Issue #507 is now in progress.

To be able to see the above Boards tab, install the ZenHub plugin.

I tested the issue again, but this time it actually worked! Another update must have solved it. But I dug around a little more and noticed that the Bokeh chart renderer was still not displaying the date correctly. Time to have a look at the code.

The code for each of the renderers is organized in separate folders, with a file for each chart type. This narrowed it down to just one file to look at for this issue. The code used to define the line chart is:

But it should be extended with an exception that checks if the data type is a date. In that case the above line needs to be:

Debug PixieDust

One nice feature that makes it easy to debug your code is something like this:

Which makes it possible to add print statements by running the following logging command in a notebook cell:

After some more testing with different data I determined that I fixed the issue and needed to do a pull request to get it merged with the master repository.

You can do a pull request from GitHub Desktop by clicking the button at the top and filling in some details of what you have done. This will then show up in the main repository and can be reviewed and hopefully merged after review.

Everyone can help!

This was a relatively simple issue to fix, but it gave me the opportunity to go through the whole process of setting up my environment, cloning the code, debugging, and a doing a pull request.

I hope that my experience has shown that contributing to open source software is not scary or difficult at all. Have a look at the issues for PixieDust and maybe pick one to solve. Let me know how you get on, and I am also happy to help!

--

--