What is a SPLOM chart? Making scatterplot matrices in Python

The scatterplot matrix, known acronymically as SPLOM, is a relatively uncommon graphical tool that uses multiple scatterplots to determine the correlation (if any) between a series of variables.

These scatterplots are then organized into a matrix, making it easy to look at all the potential correlations in one place.

SPLOMs, invented by John Hartigan in 1975, allow data aficionados to quickly realize any interesting correlations between parameters in the data set.

In this post, we’ll go over how to make SPLOMs in Plotly with Python. For extra insights, check out our SPLOM tutorial in Python and R.

Scatterplot Matrix for Diabetes Dataset

This dataset, originally from the National Institute of Diabetes and Digestive and Kidney Diseases, can be used to attempt to predict whether or not a patient has diabetes based on diagnostic measurements.

The dataset includes several medical predictor variables, such as glucose, blood pressure, and BMI.

In this case, a SPLOM is a effective way of examining any correlations between the variables.

  • The existence of high blood pressure and high glucose is correlated well with diabetes.
  • BMI above 30 especially for those over the age of 40 appears to be an indicator for diabetes.
  • Have high glucose while being pregnant appears to typically be associated with diabetes.
Source | Data | Python code

Scatterplot Matrix for Heart Disease Dataset

This database contains 76 medical attributes where the goal is to identify the presence of heart disease in the patient.

Published experiments refer to a subset of the attributes, in which researchers concentrated on attempting to distinguish heart disease positive (values 1, 2, 3, 4) from heart disease negative (value 0) patients.

The dataset was included in the University of Toronto Deep Health Hackathon in 2017.

  • Having a lower maximum heart rate and a higher cholesterol value seems to be a heart disease indicator.
  • A larger ST depression as determined by an electrocardiogram appears to be a good indicator of heart disease.
  • The finding of a reversible defect during a thallium stress test appears to be a reasonable indicator for heart disease, particularly in older patients and those with a higher resting blood pressure.
Source | Data | Python code

Scatterplot Matrix for Predicting Rain

Precipitation is typically considered to be the most impactful weather parameter.

While sophisticated computer models are used to forecast rain (or the lack thereof) on a daily basis, a meteorologist might glean useful climatological information from a SPLOM.

In this plot, we examine the relationship between various weather parameters in an attempt to understand when rainfall may be more likely.

Blue dots indicate days on which rain fell and brown dots indicate days on which no rain fell.

The city used in this example is Auckland, New Zealand.

  • Lower amounts of sun hours and higher humidity are typical of days that have rain.
  • When the minimum temperature is 5.0ºC (41ºF) or lower, rain is unlikely.
  • When the maximum temperature is 23.0ºC (73ºF) or higher, rain is unlikely.
  • Two particular wind directions, in conjunction with higher wind speeds, are more associated with rain than others: 220–300º on the compass (southwest to northwest) and 010–070º (north to northeast).
Source | Data | Python code

Scatterplot Matrix for Identifying Iris Flower Species

Through the analysis of flower traits, a scatterplot matrix can help to distinguish between different Iris flower species.

This dataset examines sepal length, sepal width, petal length, and petal width and then classifies the species.

Light blue indicates Iris setosa, pink Iris versicolor, and purple Iris virginica.

  • Iris setosa has a large sepal width but small petal width.
  • Iris virginica has a large sepal length and petal length.
  • Iris versicolor sits between setosa and virginica much of the time.
Source | Data | Python code