Crash Course in Data — Drawing Insights in Parallel: Understanding Complex Multivariate Data with Parallel Coordinates

Cibaca Khandelwal
AI Skunks
Published in
3 min readMar 28, 2023
  • A Parallel coordinates plot is used to analyze multivariate numerical data. It allows a comparison of the samples or observations across multiple numerical variables.
  • Each feature/variable is represented by a separate axis. All the axes are equally spaced and parallel to each other. Each axis can have a different scale and unit of measurement.
  • Each sample/observation is plotted horizontally.

Parallel coordinates plot is a useful tool for visualizing high-dimensional data. Here are some use cases of parallel coordinates plot:

  1. Multivariate analysis:

Parallel coordinates plot can be used to visualize and analyze multivariate data. It allows us to plot multiple variables on the same plot and observe the patterns and relationships between them.

! pip install plotly
import plotly.express as px
df_iris = px.data.iris()

fig = px.parallel_coordinates(df_iris, color="species_id",
dimensions=['sepal_width', 'sepal_length', 'petal_width',
'petal_length'],
color_continuous_scale=px.colors.diverging.Tealrose,
color_continuous_midpoint=2)
fig.show()
df_iris.head()

2. Classification:

Parallel coordinates plot can be used to classify data points based on their values. By plotting the different classes on the same plot, we can observe how they differ from each other and identify any patterns or trends that can help us distinguish them.


fig = px.parallel_coordinates(df_iris, color="species_id",
color_continuous_scale=px.colors.diverging.Tealrose,
color_continuous_midpoint=2)
fig.show()

3. Outlier detection:

Parallel coordinates plot can be used to identify outliers in the data. Outliers are data points that have values significantly different from the rest of the data, and they can be easily identified in a parallel coordinates plot as they do not follow the same pattern as the other data points.


fig = px.parallel_coordinates(df_iris, color="species_id",
dimensions=['sepal_width', 'sepal_length', 'petal_width',
'petal_length'])
fig.show()

4. Feature selection:

Parallel coordinates plot can be used to select the most important features for a predictive model. By plotting the different features on the same plot and observing their patterns and relationships with the target variable, we can identify the features that are most relevant for the prediction task.

fig = px.parallel_coordinates(df_iris, dimensions=['sepal_width', 'sepal_length', 'petal_width',
'petal_length'],
color_continuous_scale=px.colors.diverging.Tealrose,
color_continuous_midpoint=2)
fig.show()

5. Data exploration:

Parallel coordinates plot can be used to explore the data and gain insights about its structure and patterns. It allows us to visualize and analyze high-dimensional data in a more intuitive way and identify any interesting relationships or trends that may not be apparent in lower-dimensional visualizations.


fig = px.parallel_coordinates(df_iris, dimensions=['sepal_width', 'sepal_length', 'petal_width',
'petal_length'], color="species_id", color_continuous_scale=px.colors.diverging.Tealrose, color_continuous_midpoint=2)
fig.show()

--

--

Cibaca Khandelwal
AI Skunks

Tech enthusiast at the nexus of Cloud ☁️, Software 💻, and Machine Learning 🤖, shaping innovation through code and algorithms.