Multi-dimension plots in Python — From 3D to 6D.
Introduction
Visualization is most important for getting intuition about data and ability to visualize multiple dimensions at same time makes it easy. In this tutorial we will draw plots upto 6-dimensions.
Plotly python is an open source module for rich visualizations and it offers loads of customization over standard matplotlib and seaborn modules. Plotly can be installed directly using pip install plotly
. We will use plotly to draw plots.
Let’s import data
For visualization, we will use simple Automobile data from UCI which contains 26 different features for 205 cars(26 columns x 205 rows). We will use following six features out of 26 to visualize six dimensions.
Import CSV data using pandas.
Now that we have our data ready, let’s start with 2 Dimensions first.
2-D Scatter Plot
Scatter plot is the simplest and most common plot. Out of 6 features, price
and curb-weight
are used here as y and x respectively.
Unlike Matplotlib, process is little bit different in plotly. We have to make ‘layout’ and ‘figure’ first before passing them to a offline.plot
function and then output is saved in html format in current working directory. Here’s the screenshot of html plot. You can find interactive HTML plots in GitHub repository link given at the bottom.
3-D Scatter plot:
We can add third feature horsepower
on Z axis to visualize 3D plot. Plotly provides function Scatter3D
to plot interactive 3D plots.
Instead of embedding codes for each plot in this blog itself, I’ve added all codes in repository given at the bottom. Do check out.
Adding 4th Dimension:
We know we cannot visualize higher dimensions directly, but here’s the trick: We can use fake depth to visualize higher dimensions by using variations such as color, size and shapes.
Here, along with earlier 3 features, we will use city mileage feature- city-mpg
as fourth dimension, which is varied using marker colors by parameter markercolor
of Scatter3D. Here lighter blue color represents lower mileage.
Observations: It’s pretty evident from the 4D plot that higher the price, horsepower and curb weight, lower the mileage.
Adding 5th Dimension:
Size of the marker can be used to visualize 5th dimension. Here we will use engine-size
feature to vary size of marker using markersize
parameter of Scatter3D.
Observations: Engine size variations can be clearly observed with respect to other four features here. Higher the price, higher the engine size. Also lower the mileage, higher the engine-size.
Adding 6th Dimension:
Using shape of marker, categorical values can be visualized. Plotly provides about 10 different shapes for 3D Scatter plot( like Diamond, circle, square etc). So 10 at most 10 distinct values can be used as shape.
We have num-of-doors
feature which contains integers for number of doors( 2
and 4
) These values can be converted into shapes string by defining shape of square for 4 doors and circle for 2 doors, which will be passed to markersymbol
parameter of Scatter3D.
Observations: In this 6D plot, lower priced cars seem to have 4 doors(circles). We will get more insights into data if observed closely.
Can we add more dimensions?
Certainly we can! Marker has more properties such as opacity and gradients which can be utilized. But if we add more dimensions, it makes it difficult to appreciate marker points.
Source Code:
Python code and interactive plot for all figures is hosted on GitHub here.
Thanks for reading! Suggestions are welcome.