Understanding Marketing Analytics in Python. [Part 6] Using PairGrid . With example and code

Understanding Pair Plots and Pair Grids

Kamna Sinha
Data At The Core !
3 min readSep 22, 2023

--

This is part 6 of the series on Marketing Analytics, have a look at the entire series introduction with details of each part here.

Pairplot is a pandas library which gets imported via matplotlib .They are called scatter_matrix in pandas [ last story — part 5 of this series] ,and pairplot in seaborn library. PairGrid is a Seaborn library — which comes with some style and makes plots pryttier and dynamic , i.e. you can play around with the plots.

We can find a very good explanation here for various options in the library :

Now going back to our example , Creating a plot using PairGrid() is a multistep process.

  1. First, we create a PairGrid() object, passing it our dataframe (here we look at just age, distance_to_store, and store_spend, along with email for setting the colors)
  2. Next, we specify the size of each panel using the height argument and set the hue of plotted elements to reflect values of the email column.
  3. We then add a few arguments that customize how different email values will be represented by setting a color palette and passing a list of markers (shapes) to be plotted in the hue_kws argument. When specifying markers, there must be a marker defined for each category. In this case we have two categories, so we pass two markers: ‘o’ which indicates a circle and ‘s’ which indicates a square.
  4. We also need to define a map. PairGrid() has a variety of functions that allow setting the same plotting function :

a. for all panels (map(func)),

b. for the panels along the diagonal (map_diag(func)),

c. for the panels off the diagonal (map_offdiag(func)),

d. for the upper triangle (map_upper(func)), and

e. for the lower triangle (map_lower(func)).

The func argument is a plotting function, such as plt.hist or plt.scatter. Any additional arguments are passed to that plotting function.

import seaborn as sns
g = sns.PairGrid(cust_df[['age', 'distance_to_store', 'store_spend', 'email']],
height =2.5, hue='email', palette='Set2', hue_kws={"marker": ['o', 's']})
_ = g.map_offdiag(plt.scatter, s=20, alpha=0.5)
_ = g.map_diag(plt.hist, bins=20)
_ = g.add_legend()
A scatterplot matrix for the customer dataset produced using PairGrid()

In each scatterplot panel, the values from customers without an email on file are green circles whereas those from customers with email are orange squares. The histograms are color-coded as well, although they are somewhat hard to interpret due to the fact that we have many fewer customers without email — below is a closer look at one of the histograms.

We can now see the color coding in the histogram

Several different common plots can be generated in a single line using pairplot(). Use PairGrid when you need more flexibility.

Ref : https://seaborn.pydata.org/generated/seaborn.PairGrid.html

Although scatterplots provide a lot of visual information, when there are more than a few variables, it can be helpful to assess the relationship between each pair with a single number. One measure of the relationship between two variables is the covariance. We shall see more on Correlation Coefficients in the next part 7 of this series , and how to go about bringing more surity in relationships between variables beyond visualization.

--

--