Introduction to PandasGUI — for easier and interactive visualization with Python
Pandas is the most widely used Python data analysis library. It is nearly impossible to learn data analysis or data science without using the Pandas library. We use it for data manipulation, preprocessing, exploration, statistical analysis, visualization and so on. There are so many things we can do with Pandas.
Visualization is obviously one of the most essential parts in data analysis. It is important to graphically display any pattern or information that you obtained from a given dataset. There are many data visualization libraries and tools you can employ for data visualization such as Matplotlib, Seaborn and Plotly.
I only mentioned three Python data visualization libraries, but in fact there’s many more. For now, let’s just focus on these three libraries. Matplotlib is the first Python data visualization library, yet still the most widely used one. Seaborn is bascially built upon Matplotlib, but it was designed for more aesthetically pleasing and modern visualization. Plotly was rather recently invented, and it has its strength in that it provides interactive visualization.
I bet most of the beginners of data analysis and data visualization start with Matplotlib and Seaborn, since these libraries together essentially allow us to graph anything we need. However, one thing these are lacking is the interactive part. In order to make your visualization more intuitive and informative, you need to make it interactive! That’s when Plotly comes in as it provides a lot of convenient interactive graphing tools.
However, it is never easy to get used to a new tool and it always takes some time to learn how to use it. For those who agree with this, there is a very intuitive and easy Graphic User Interface called PandasGUI. This amazing GUI allows you to create interactive graphs based on Plotly without having to write lines of codes!
The introduction was a bit too long, but anyway, this article will introduce some basics of PandasGUI for easier interactive data visualization.
PandasGUI is a Python package and you can simply install it using the pip
package manager.
pip install pandasgui
Once the installation is done, we will import the library. There is essentially a single function that we will be using here, and it is called show()
, so let’s import this function.
from pandasgui import show
There are a couple of ways you can use the function show()
.
- Simply running
show()
will open up the application where you can import datasets yourself - Passing a dataframe to the function and running it will open up the application that already contains rows and columns information of the dataset.
This probably doesn’t make much sense to you for now. Let’s just dive in.
Run the line of code below.
show()
Then you will see something that looks like this below.
This is just PandasGUI application without anything, and you can import any dataset of your choice.
However, if you already have a dataframe that you’ve been working on, which is usually the case, you can simply pass the dataframe to the show()
function.
For an example dataset, let’s use breast cancer dataset provided by Scikit-Learn.
from sklearn.datasets import load_breast_cancerbreast = load_breast_cancer()
breast_df = pd.DataFrame(wine.data, columns = wine.feature_names)
breast_df['target'] = breast.target
breast_df.head()
The first five rows of the breast cancer dataset can be seen above. There are total of 31 columns in the dataframe we created.
Now let’s pass the dataframe to the almighty show()
.
show(breast_df)
Running the command above will return something that looks like this:
Pretty impressive, right? It contains all the rows and columns information of the dataframe that you passed.
Let’s explore some of the functions of PandasGUI.
Cliking on the ‘Statistics’ button at the top of the interface will allow you to see the summary statistics of the columns of your dataset.
As can be seen above, the type of each column, count, number of unique values, mean, standard deviation, min and max values are summarized in a single table.
Now click the ‘Grapher’ button right next to the ‘Statistics’ button.
You will see more than ten different types of visualization methods such as scatter graph, line graph, bar graph and so on.
Let’s try one. The column ‘mean area’ and ‘mean concavity’ represent mean area and mean concavity of each cell nucleus. In order to figure out the relationship between these two variables through visualization, it may be a great idea to graph a scatter plot between them and maybe add the OLS regression line.
Usually, we need to write lines of codes to do so. With PandasGUI it becomes much simpler. You only need to drag the columns to appropriate places. Let’s have ‘mean area’ on the x-axis and ‘mean concavity’ on the y-axis. Doing so will return something that looks like this:
A scatter plot has been generated! And it is interactive — you can place a cursor on the scatter points of the graph and it will return the values.
Now let’s say that we want to look at two different cases where target = 1(benign) and target = 0(malignant). We can do this by adding ‘target’ column to the ‘color’ section. Let’s see what this returns.
This is a quite interesting result since target = 1 tends to show low mean concavity and low mean area values, while target = 0 shows the opposite.
We can obtain the same result by having the variable ‘target’ on ‘facet_row’ or ‘facet_col’, which separates the graph depending on the discrete values. It is easier to understand by running it.
As can be seen, target = 0 and target = 1 cases have been separated and they are represented in two separate graphs. In addition, setting the trendline as ols allowed us to observe the OLS regression lines on the graphs.
Another great thing about PandasGUI is that it does not only give you the graphical results, but also returns lines of codes for the graphs! Clicking the ‘Code Export’ button at the bottom returns the codes for the graphs so that you can modify them as you wish.
In this example, we only looked at visualization with scatter plots, but there are obviously many other visualization tools available with PandasGUI.
Even though it is not of our interest in this article, PandasGUI also provides convenient tools for data reshaping. It allows you to pivot, melt, merge and concat datasets, so I recommend you to try these yourself to get used to it!
PandasGUI can be very effectively used when you are looking for simple yet powerful graphs for data visualization. You can create aesthetically pleasing and interactive graphs with only a single line of code with the show()
function.
Hope this article helped you understand the basics of PandasGUI, and for those who never had a chance to use PandasGUI, please try it out yourself! Thanks for reading:)