JupyterLab: Next Generation Data Exploration

Joao Sousa
Feedzai Techblog
Published in
4 min readDec 19, 2019

It rhymes, therefore it’s true. For those of you who know Jupyter, you can picture JupyterLab as “Jupyter on steroids.” For those of you who don’t, you can picture it as an interactive Python development environment, where instead of working within the terminal window, you work with what’s called a notebook.

An example of Feedzai’s Data Exploration notebook

A notebook is a tabular document with a mix of code snippets and documentation sections. It provides an interactive approach to plain old Python scripts. Notebooks can be seen as the next-level scripts, where the user can run a script (notebook) in multiple stages (cells).

JupyterLab’s features can suit a lot of procedural scenarios, and at Feedzai we use it extensively on our data science platform.

JupyterLab in a Data Science world

Data Scientists’ routines usually involve:

  • Data understanding
  • Feature engineering
  • Model training/evaluation

There might be additional steps here, but I want to highlight to the first step of the process ‘data understanding.’ For a data scientist or data engineer to understand a given dataset, the data scientist must have access to a strong exploration tool with robust visualization features. Some examples:

Listing showing pandas visualization features in JupyterLab
  • To check if a given field is useful as a feature, I may want the missing values percentage. Fields that rarely have values are mostly useless.
Listing showing pandas fluent API
  • In numerical fields, it is often useful to have some baselines, such as average and median, to check the distribution of that given field. For instance, the average transaction amount of a given shop is useful to detect “high spendings.”

Having these single values available is only part of the solution for a data scientist who wants to know more about the data they are working with. To have a more complete view of the data, we usually need plots that enhance the insights provided by these simple metrics.

JupyterLab Extensions

JupyterLab with Altair’s calendar plot

JupyterLab easily allows its users to extend the default behavior of this IDE through JupyterLab-extensions. Thanks to this feature, JupyterLab not only provides integrations with matplotlib, GitHub, and pandas, but it also allows users to build their own extensions. One of our first extensions was a custom Feedzai Theme!

Reporting

JuypterLab’s capabilities open many doors in terms of data manipulation. Additionally, there is much focus on data visualization tools that can seamlessly integrate with JupyterLab (e.g., matplotlib, seaborn, Altair). However, visualization by itself is a transient activity, with data becoming less useful as it gets old. As a solution for this limitation, JupyterLab comes with a set of built-in reporting features. Through the UI, users can export notebooks in multiple formats. The same through the terminal using the nbconvert command. All of this allows users to persist the visual output of a given notebook as a document (e.g. PDF).

Let your creativity flow

Given that notebooks can be used for time-bounded data, we can create (and even automate) a system of periodic reports, which is an awesome way of analyzing data evolution. At Feedzai, we have built notebooks that can create multiple plots based on specific datasets. We then compile these plots into an HTML report, which can either be manually or automatically generated. These reports allow users to see visualizations over short timeframes, along the ability to compare data evolution over long timeframes. This occurs without much cost in terms of performance (we only generate reports once) and memory (reports are persisted as HTML and JavaScript code mostly).

Conclusion

As an exploration tool, JupyterLab aims at being in the sweet spot between terminal and GUI tools, and it should be used as such. There’s a huge amount of exciting points that I would like to cover, and I hopefully will in further posts. At Feedzai we’ve just uncovered the tip of the Iceberg. Each day we discover new problems to tackle through JupyterLab.

Do you feel that you have interesting ideas for JupyterLab? Leave a comment, or even better, join us in this journey of taking JupyterLab to a whole new level!

--

--