A GUI for Pandas / Bamboolib

Ahmedabdullah
Red Buffer
Published in
6 min readJan 3, 2022
Photo by kazuend on Unsplash

If you have landed on this article while scrolling the internet learning more about pandas (one of the core libraries of python), my assumption is that you are someone who is already working in the domain of data and often come across scenarios where you have to use pandas either for your EDA or for performing some sort of analysis. The bottom line is, we all use pandas one way or the other routinely, and why not? After all, it makes life a LOT easier. What if I told you that there is a GUI version for pandas that can make life even better!

Bamboolib is transforming the way we analyze experimental data, allowing everyone on our team to quickly wrangle highly complex datasets in a reproducible manner. Along the way it also teaches best practices to novice coders and avoids wasted time digging through online forums. This is the missing link that we’ve been waiting for to move our analytics workflows out of spreadsheets without sacrificing speed or flexibility.

Getting Started

In this article, we will explore bamboolib and get a deeper insight into its performance.

Disclaimer: a lot can be accomplished via bamboolib but I’ll only be covering some of its salient features to make you understand what this library is in a nutshell.

In order to get started with bamboolib, all you need is a good old-fashioned PIP command to install it. Open terminal or Jupyter notebook and type:


pip install bamboolib

Importing the library and getting started with it

In order to use bamboolib import the library as:

import bamboolib as bam

and you’re done! Now load the data as you normally would with pandas and once the data is loaded (assuming you’ve loaded the CSV in a variable called data) all you have to do in order to get started with bamboo lib is as simple as typing:

 bam.show(data)

For purpose of a demo, I have done the process on the publicly available data of properties provided by zameen.com. The initial show command will reveal something like this:

As you can see, this is just the initial data frame shown. Here we have two options; either we can create a plot feature in bamboolib which will help us to create various plots over our data frame OR we can choose to go with Explore DataFrame for EDA.

Create Plot

The create plot feature is one of the most amazing features of bamboolib. It allows you to create one of many plots with just a few clicks. Let’s see how easily this can be done. As soon as we click on create plot we get multiple options for graphs. These graphs are not only limited to conventional 2D graphs but also allow us to use 3D graphs. Below are a few of the options for graphs:

Let’s say we want to create a graph to see the comparison between prices of property between different areas of Islamabad.

It is as simple as selecting “Figure Type” which I have selected as a bar graph, selecting column names for the x and y-axis which I have selected as area and price respectively and that’s pretty much it. Bamboolib does all the work for you.

It even allows you to choose different themes for your plot, scaling factor, log scaling, color themes, order of columns, and opacity, etc.

Impressed yet? Bamboolib even lets you copy the code for your plots by a simple click on the copy code button using which you can even create these beautiful visuals outside your Jupyter notebook.

Wait, there’s more. Here comes bamboolib once again, since the majority of its graphs utilize Plotly, all the graphs created are interactive and you can even create a hover show of data on the graphs. It also allows you to directly save the graph as a png with a single click.

EDA

Now let’s just say that you are not interested in the visuals and are more interested in knowing what kind of data and features that you actually have. No worries, bamboolib got you. This is what you get with a single click on “Explore DataFrame”. Your entire data is explained as you’ve never seen before.

It tells you all the data types along with missing values, percentage of missing values so you can decide for yourself to see if you need to impute the data or drop the column or do whatever you see fit with the data. It also lets you know the unique values in each of the features of your data.

Want to know about just a single feature? Start by checking out the property types in the data. Simply click on the column named property_type or select from the drop-down on top. You’ll get the following results:

Not only does it tell you the unique and missing values but also helps you find the distribution of data along with the percentage of presence of the unique values in that specific column. This also helps you see the count and cumulative percentages. Still not impressed?

Let’s dive deeper into the selected column, if you click on “Bivariate plots” it’ll show you just what you need to know regarding the distribution of your data. The box plots visually explain the distribution of all the values of your column.

You can even set some predictors here to see how well a column helps to predict another one as a target. In my case, this might not be a good example because of a lot of categorical values but still, I’ll show you the screenshot. Notice that it also indicates the MAE:

And this doesn’t stop here, we can actually go a step further to see which features help us to predict our current feature and how much they contribute to this:

We can see the details of each to verify for ourselves as I have for the area category.

Correlation Matrix

Last but not the least, bamboolib also enables you to see a correlation between the features using the correlation matrix it generates with a single click:

Conclusion

I might have missed a lot of things but I have covered enough for you to get intrigued and explore on your own. If you find interesting use of the library do write an article and share it with me.

Libraries like bamboolib save a lot of time and make our work a lot easier but still in order to utilize such powerful tools make sure that you have enough baseline knowledge to understand what each graph and stats mean. These tools are brilliant but only when you have got your basics covered and concepts right. For someone who has a strong grip on EDA, you’ll love this.

Thanks for reading.

--

--