Understanding Marketing Analytics in Python. [Part 6] Using PairGrid . With example and code
Understanding Pair Plots and Pair Grids
This is part 6 of the series on Marketing Analytics, have a look at the entire series introduction with details of each part here.
Pairplot is a pandas library which gets imported via matplotlib .They are called scatter_matrix in pandas [ last story — part 5 of this series] ,and pairplot in seaborn library. PairGrid is a Seaborn library — which comes with some style and makes plots pryttier and dynamic , i.e. you can play around with the plots.
We can find a very good explanation here for various options in the library :
Now going back to our example , Creating a plot using PairGrid() is a multistep process.
- First, we create a PairGrid() object, passing it our dataframe (here we look at just age, distance_to_store, and store_spend, along with email for setting the colors)
- Next, we specify the size of each panel using the height argument and set the hue of plotted elements to reflect values of the email column.
- We then add a few arguments that customize how different email values will be represented by setting a color palette and passing a list of markers (shapes) to be plotted in the hue_kws argument. When specifying markers, there must be a marker defined for each category. In this case we have two categories, so we pass two markers: ‘o’ which indicates a circle and ‘s’ which indicates a square.
- We also need to define a map. PairGrid() has a variety of functions that allow setting the same plotting function :
a. for all panels (map(func)),
b. for the panels along the diagonal (map_diag(func)),
c. for the panels off the diagonal (map_offdiag(func)),
d. for the upper triangle (map_upper(func)), and
e. for the lower triangle (map_lower(func)).
The func argument is a plotting function, such as plt.hist or plt.scatter. Any additional arguments are passed to that plotting function.
import seaborn as sns
g = sns.PairGrid(cust_df[['age', 'distance_to_store', 'store_spend', 'email']],
height =2.5, hue='email', palette='Set2', hue_kws={"marker": ['o', 's']})
_ = g.map_offdiag(plt.scatter, s=20, alpha=0.5)
_ = g.map_diag(plt.hist, bins=20)
_ = g.add_legend()
In each scatterplot panel, the values from customers without an email on file are green circles whereas those from customers with email are orange squares. The histograms are color-coded as well, although they are somewhat hard to interpret due to the fact that we have many fewer customers without email — below is a closer look at one of the histograms.
Several different common plots can be generated in a single line using pairplot()
. Use PairGrid
when you need more flexibility.
Ref : https://seaborn.pydata.org/generated/seaborn.PairGrid.html
Although scatterplots provide a lot of visual information, when there are more than a few variables, it can be helpful to assess the relationship between each pair with a single number. One measure of the relationship between two variables is the covariance. We shall see more on Correlation Coefficients in the next part 7 of this series , and how to go about bringing more surity in relationships between variables beyond visualization.