FlightPredict II: The Sequel

Predict flight delays (now with PixieDust)

//va
Center for Open Source Data and AI Technologies
6 min readDec 19, 2016

--

A couple months ago, David Taieb put together a tutorial on how to Predict Flight Delays with Apache Spark MLLib, FlightStats, and Weather Data. For the sequel, we sprinkle some PixieDust onto his original solution and the result is pure magic.

PixieDust is an open source Python helper library that extends the usability of notebooks. Using PixieDust’s visualization and apps features, we provide a customized, interactive, and more pleasing experience than you’ll find in a regular notebook.

Pre-flight checklist

Before you follow the steps in this post, run through the Predict Flight Delays with Apache Spark MLLib, FlightStats, and Weather Data tutorial. At a minimum, you must complete the following steps from that tutorial:

✓ Set up a FlightStats account (REQUIRED! In the first tutorial, you could skip this step, but you need these credentials to run this notebook.)

✓ Provision the Weather Company Data service

✓ Obtain or build the training and test data sets

Once you’ve done that, you can tackle this tutorial, which is a run-through of my Flight Predict with PixieDust notebook, which you can run from the IBM Data Science Experience (DSX) or from a local Jupyter Notebook environment (with Spark 1.6.x and Python 2.x).

Cleared for take-off

While you can run the application from any Jupyter Notebook environment, I used IBM’s Data Science Experience. The first step is to get the Flight Predict with PixieDust notebook into DSX:

Note: For best results, use the latest version of either Mozilla Firefox or Google Chrome.

  1. Sign into DSX.
  2. Create a new project (or select an existing project).
  3. On the upper right of the screen, click the + plus sign and choose Create project.

Add a new notebook (From URL) within the project

  1. Click add notebooks
  2. Click From URL
  3. Enter notebook name
  4. Enter the notebook URL: https://raw.githubusercontent.com/ibm-cds-labs/simple-data-pipe-connector-flightstats/master/notebook/Flight%20Predict%20with%20Pixiedust.ipynb
  5. Select the Spark Service
  6. Click Create Notebook

If prompted, select a kernel for the notebook. The notebook should successfully import.

Fly through the notebook

Run through each cell of the notebook in order.

When you use a notebook in DSX, you can run a cell only by selecting it, then going to the toolbar and clicking on the Run Cell (▸) button. If you don’t see the Jupyter toolbar showing that run button and other notebook controls, you’re not in edit mode. Go to the dark blue toolbar above the notebook and click the edit (pencil) icon.

Go through the notebook, running each code cell.

  1. Install PixieDust and its flightpredict plugin.
    Run the first 2 cells, which install and update pixiedust and the pixiedust-flightpredict plugin.
  2. Restart the kernel.
    From the menu, choose Kernel > Restart.
  3. Run the following cell to import the python package and launch the configuration dashboard:
    import pixiedust_flightpredict pixiedust_flightpredict.configure()
    The dashboard checks the current status of the app and guides you through setup.
  4. Add credentials and update incorrect or missing info (x icon) entries.
    On the top right of the dashboard list, Click the Edit Configuration button. Enter the credentials you got completing the first tutorial.
    To save, click the Save Configuration button. The dashboard updates to show completed data.
  5. To create a cell with code to load the training data, click on Generate Cell code to load trainingData.
    The new cell appears under the dashboard.
  6. Go to the newly created cell and run the cell. The cell output is a PixieDust visualization of training data which you can view in various formats and also download or save into Cloudant or Object Store.
  7. Re-run the Configuration Dashboard cell you ran in Step 3 and it updates show you’ve loaded training data.
  8. Complete configuration.
    Continue through the dashboard, clicking each Generate Cell code to load button then running the new cell that appears below the dashboard. Repeat for each remaining incomplete task, except for custom handler, which is optional. (You can use the custom handler cell to provide new classification and features. For example, you may want to include a day of departure feature.)
  9. To confirm that you completed all steps, you can run the dashboard cell again. All entries should show None under Action required (except the custom handler, which is optional)

Train and evaluate the models

Like the first flight tracker tutorial that you ran through, this notebook creates and runs four models (Logistic Regression, Naive Bayes, Decision Tree, and Random Forest) — this time using PixieDust to display data and the model evaluations.

  1. Now that your data’s loaded, go to the Train multiple classification models section and run each of the four code cells.
  2. Run the display(testData) cell to evaluate the models.

The pixiedust-flightpredict plugin generates a custom airplane dropdown menu that lets you:

  • Measure accuracy via an accuracy table and confusion matrixes, which you read about in the first tutorial. Again, you can use this tool to judge performance and decide if more training data is needed or if the classes need to be changed.
  • See a histogram showing the probability distribution.
  • Visualize Features (results) in a scatterplot.
The airplane menu is a custom PixieDust plugin created for this notebook.

PixieDust provides an API that makes it easy for anyone to contribute a new visualization plugin, like that nifty plane menu. You too can extend PixieDust with custom features that serve your needs. Stay tuned for tutorials and docs explaining how to code your own plugin.

Run the models

The predictive models are now in place, and it’s time to launch the flight delay prediction application. In the Run the predictive model application section, run the cell. (You can change the initial airport code, LAS, to another city, if you want. You’ll also be able to do so in the app that launches.)

Enter a flight information and click Continue.

You’ll see delay predictions from the models, the weather forecast for each airport, and a flight path map:

From here, you may Start Over to enter a new flight information or Go to Notebook to return to the notebook.

What you can make out of it

Run the last code cell in the notebook, which displays a map with an aggregated view of all the flights that the app has searched:

  • Click on an airport to see all outgoing flights
  • Click on a flight path to get a listing of the flights and number of passengers who searched the specific flight

You can return to the notebook and continue to play with the data. See what you can uncover or improve upon within the flight delay predictions.

You are now free to move about the cabin

Predicting flight delays based on weather using machine learning started out as a way of showcasing the flexibility of a notebook. However, with the inclusion of PixieDust, visualizing the data is now even easier. To take it all the way, you could build a user interface and make this a full-fledged application. You can load, manipulate, and present the data all within the notebook.

PixieDust is an open source project looking to improve the notebook experience. You’ll find lots of guidance in its GitHub repo wiki. All are invited to contribute and pull requests welcome! We can have a parade and serve hot hors d’oeuvres…

--

--