An Alliance: Python and R (Seaborn and ggplot2)
Working with data means that the majority of the time, your data will be visualized to get the message across to the target audience. The python and R programming languages have libraries inbuilt that aid data visualization. The two popular libraries are seaborn which is built on the Matplotlib library for python and ggplot2 for R.
Loading seaborn in python
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
Loading ggplot2 in R
library(ggplot2)
In my article, I wrote about how the default syntaxes of some of these libraries don’t aid in data visualization but there are some parameters that can be tweaked by adding additional syntaxes. In python, these syntaxes are joined without a division and the programming language reads them together but in R these syntaxes must be joined with a + sign for the program to recognize them together as a full command or it only reads the first syntax.
Some of the parameters that need additional syntaxes are;
- Style: The style is the background theme of the plot and it determines the appearance of the plot. In seaborn, the style is set using
sns.set_style()
command while in R the background theme is set by choosingtheme_classic()
,theme_void()
,theme_bw()
, etc.
2. Size: The size of the plot helps to capture information that could be lost especially when creating plots with a lot of categorical variables. Adjusting the width and the height of the plot helps to put everything in perspective. In ggplot2, the sizes are set automatically to fit the plot, while in seaborn the size can be set by applying plt.subplots(figsize=( , ))
3. Orientation: This sets the plot orientation to be either vertical or horizontal. In seaborn, the orientation is set using orient='h'/ orient='v'
In R, it can be set using coordflip()
and when the orientation is set to horizontal, the categorical x-axis becomes the y-axis and the numerical y-axis becomes the x-axis.
4. Color: The color of the plot can be changed to follow a specific palette or a set to a particular color. The colors can also be set manually instead of using a palette when the fill or hue is set as a variable.
5. Axis style: The axis style adjusts the axis label and the axis ticks.
6. Legend: The legend is used to show the filled categorical variables. In most plots, the legend looks better when it is placed outside the plot grid.
7. Plot title: In some cases, adding the title of the plot helps a reader understand the context of the plot.
Creating plots with the libraries
Let’s look at a side-by-side comparison of the libraries to see how the plots look using the tips dataset.
Python
df = sns.load_dataset("tips")
df.head()
R
df <- tbl_df(tips)
head(df)
Barchart
seaborn code
sns.set_style("whitegrid")
bar,ax = plt.subplots(figsize=(8,6))
ax = sns.barplot(x='day', y='total_bill', hue= 'sex', data=df, ci=None, palette='Set1',orient='v')
ax.set_title("Bill spent on each day", fontsize=20)
ax.set_xlabel ("day", fontsize=15)
ax.set_ylabel ("bill ($)", fontsize=15)
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=15)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., fontsize=15)
ax.grid(False)
ggplot2 code
ggplot(df, aes(x= day, y = total_bill, fill=sex))+
scale_colour_brewer(palette = 2)+
geom_bar(stat = "identity", position = position_dodge(.9), width = 0.9)+theme_classic()+ggtitle("Bill spent on each day")+
xlab(NULL)+ylab("total_bill($)")+theme(title=element_text(size=15, face='bold'), axis.text.x = element_text(size = 15, face = "bold"), axis.text.y = element_text(size = 15, face = "bold"), axis.title.y = element_text(size=15, face='bold'))
Comparing the graphs, the seaborn graph is fixed in a grid box while the ggplot2 graph only has the x and y-axis lines. The categorical variables on the x-axis for seaborn are arranged according to days of the week while ggplot2 arranges it in alphabetical order.
Boxplot
seaborn
sns.set_style("whitegrid")
bar,ax = plt.subplots(figsize=(8,6))
ax = sns.boxplot(x="day", y="total_bill", data=df, palette='Set1')
ax.set_title("Bill spent on each day", fontsize=20)
ax.set_xlabel ("day", fontsize=15)
ax.set_ylabel ("bill ($)", fontsize=15)
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=15)
ax.grid(False)
ggplot2
ggplot(df, aes(day, total_bill, fill=day))+
stat_boxplot(aes(day, total_bill), geom='errorbar', linetype=1, width=0.5)+ggtitle("Bill spent on each day")+
geom_boxplot( aes(day, total_bill),outlier.shape=1)+theme_classic()+xlab(NULL)+ylab("total_bill($)")+theme(legend.text = element_blank(), legend.title = element_blank(),
legend.key = element_blank(), legend.position = "none", title=element_text(size=15, face='bold'), axis.text.x = element_text(size = 15, face = "bold"), axis.text.y = element_text(size = 15, face = "bold"), axis.title.y = element_text(size=15, face='bold'))
Kernel density plot
seaborn
bar,ax = plt.subplots(figsize=(8,6))
ax=sns.kdeplot(data=df, x="total_bill", hue='sex', palette='Set1')
ax.set_ylabel ('', fontsize=20)
ax.set_xlabel ('Total bill ($)', fontsize=20)
ax.tick_params(axis='x', labelsize=20)
ax.tick_params(axis='y', labelsize=20)
ax.grid(False)
ggplot2
ggplot(df, aes(total_bill, col=sex)) +
geom_density()+theme_classic()+xlab('total bill($)')+ylab(NULL)+
theme(title=element_text(size=15, face='bold'), axis.text.x = element_text(size = 15, face = "bold"),
axis.text.y = element_text(size = 15, face = "bold"), axis.title.y = element_text(size=15, face='bold'))
The y-axis ticks KDE plot for seaborn are in three decimal places while the y-axis ticks for ggplot2 are in two decimal places.
Violin plot
seaborn
sns.set_style("whitegrid")
g=sns.catplot(x="sex", y="total_bill",
hue="smoker", col="time",
data=df, kind="violin",
height=7, aspect=.8);
ggplot2
ggplot(df, aes(x = sex, y= total_bill, fill=smoker))+facet_grid(~time)+
geom_violin(alpha= 0.9) +
theme_classic()+theme(title=element_text(size=15, face='bold'), axis.text.x = element_text(size = 15, face = "bold"),
axis.text.y = element_text(size = 15, face = "bold"), axis.title.y = element_text(size=15, face='bold'))
Scatter plot
seaborn
sns.set_style("whitegrid")
bar,ax = plt.subplots(figsize=(8,8))
sns.scatterplot(data=df, x="total_bill", y="tip", hue='sex', palette='Set1')
ax.set_title("Bills and Tips", fontsize=20)
ax.set_xlabel ("total bill($)", fontsize=15)
ax.set_ylabel ("tip($)", fontsize=15)
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=15)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., fontsize=15)
ax.grid(False)
ggplot2
ggplot(df, aes(x = total_bill, y = tip)) +
geom_point(aes(color = factor(sex)))+theme_classic()+ggtitle("Bills and Tips")+ylab('tip($)')+xlab('total bill($)')+
theme(title=element_text(size=15, face='bold'), axis.text.x = element_text(size = 15, face = "bold"),
axis.text.y = element_text(size = 15, face = "bold"), axis.title.y = element_text(size=15, face='bold'))
Corrplot
seaborn
#compute correlation
corr_matrix = df.corr()
corr_matrix
sns.heatmap(corr_matrix, annot=True)
plt.show()
ggplot2
#calculate pairwise
res <- cor(df1, method = "pearson")
corrplot(res, method="color")
The seaborn corrplot maintains the aspect correlation value on the number scale while the ggplot2 corrplot reads from -1 to +1.
There are a lot of similarities as well as differences in these plots made with the different libraries. In general, ggplot2 plot graphics are visually sharper than that of seaborn. These two libraries have a lot to offer and it all depends on personal style and preference.