Access RStudio’s ggplot2 in Python

And significantly upgrade your Python data visualizations.

Aditi Mahabal
The Startup
3 min readMay 18, 2020

--

What is ggplot2?

ggplot2 is a powerful data visualization system with a wide range of functionalities that allow beginners and experts alike to learn, practice, and create unique and advanced data visualizations. Its capacities stem from its design — ggplot2 is operates in layers, enabling users add layers as they add complexity to their visualizations (or, maintain a simplistic style and code).

ggplot2’s data scientist-friendly design is reflected in its syntax. The basic code for creating a simple graph is as follows:

ggplot(dataframe, aes(x = xvariable, y = yvariable, optional fill/size/shape = variables) + geom_graph(optional args) + labs() + theme() +…..

To create a basic graph, only the first two layers (or segments) of this code are required: the first layer calls on the ggplot package and calls the necessary variables, and the second declares which graph type to use (ex., geom_bar() makes a bar graph). With just these two layers, a full graph is created. Each additional layer is optional, and only adds complexity to the visualization: labs(), theme(), and countless more layers provide different specified elements with which to design the graph, such as axes labels, data coloring/sizing, and more. For a data scientist, this capacity to design clean, detailed graphs by simply adding and subtracting layers is very helpful for visualizations, especially with an audience in mind.

The issue of Python

Unfortunately, the original ggplot2 package was written exclusively for R (to every Python user’s dismay). The lack of strong visualization tools in Python makes data analysis challenging; matplotlib, pandas, and seaborn are limited in their capacity for data visualization.

To exemplify this disparity, I graphed two scatterplots using the “mpg” dataset included in the ggplot2 package, coding one in Python using seaborn and pandas, and one in R using solely ggplot2. What resulted is pronounced difference in the two’s quality and capacity to communicate findings.

Mapping with Pandas and Seaborn in Python
Mapping with ggplot2 in RStudio

It is clear that ggplot2 is the cleaner, more viewer-friendly tool to create visualizations. As a data scientist, it is important that an intended audience is able to easily and quickly extract insight from a visualization, something that ggplot graphs provide. However, it is also important that data analysis code is accessible to other scientists, developers, and serves applicability across platforms. Coding solely in R is difficult in this sense — Python is a stronger system, and a fast-growing means of sharing and communicating information. So, how do we combine ggplot2’s visualization functionality with the operating benefits of coding entirely in Python?

Accessing ggplot2 in Python

ggplot2’s closest match in Python is a package known as plotnine, which uses ggplot2-like syntax and graphics to create the same quality of visualizations.

Installing plotnine

The following code is the simplest method for installing plotnine:

from plotnine import ggplot

Within this line of code, there are additional steps that must be taken in Python to enable all of ggplot’s functionalities.

  • Each graph type (bar, scatter, density, etc.) must be imported following the import ggplot call (i.e., if you are creating a scatterplot, then “import ggplot, geom_point” must be called).
  • Each layer/element to your graph must also be imported following the two previous calls; for example, visualizing a density graph with axes labels and multiple dimensions entails the call “import ggplot, geom_density, labs, facet_wrap”.

Though these additional steps are tedious, the ability to use ggplot2 in Python is beyond worth the effort. Make no mistake: ggplot2 is extremely useful to any data scientist, and the ability to incorporate ggplot2 into the wide applicability of Python programming is immensely valuable.

--

--

Aditi Mahabal
The Startup

University of Virginia undergraduate. I write about data science and cool projects I’ve done! linkedin.com/in/aditimahabal/