Parallel Coordinate plots to visualize safety margins
Parallel coordinate plots (PCPs) are an alternative to scatter plots in which the X and Y axis are placed parallel to one another instead of a 90 degrees angle. They are mostly used to show evolution of a variable over time. PCPs use in judging correlation was studied by researchers in the Netherlands (see http://wwwvis.informatik.uni-stuttgart.de/plain/vdl/vdl_upload/280_3_JournalArticle-ScatterVSPCP.pdf) but their conclusions were pretty negative. The graphs look messy. However, I have found that by applying a correlation or trend line equation to the scales of the reference axis, one can get a nice message. Let me explain.
A point A at coordinates (x,y) is represented as a point in a scatter plot and as a line in a parallel coordinate plotting as shown in the figure below.
Nice PCPs can be built using D3.js (e.g. http://mbostock.github.io/d3/talk/20111116/iris-parallel.html).
Increasing the dimensions
As one can see, by moving from a scatter to a parallel coordinates plot, we move from a mono-dimensional (point) to a bi-dimensional view (line).
We now have a new feature we can use to encode information : the slope.
The slope is controlled by the reference system chosen for the X and Y axis. Each parallel coordinate plot is defined by a reference function F(x) = y , transforming every x on the X axis into the exact opposite y on the Y axis.
The 3 figures below all represent correctly the point/line A(10,10). But depending on the transformation function chosen for the PCP, the line can be straight, ascending or descending. This may sound awed, but we can use that fact to our advantage.
Let’s again consider our two variables X and Y. Plotting them on a scatter plot can give you anything going from a cloud to a line.
In most spreadsheet programs (e.g. EXCEL, Numbers) you can add a trend line directly onto a scatter plot through a simple click. The trend line is calculated through linear regression and has an equation of the form y = a*x+b. It’s the line which best fits your data.
You can see the trend line as being the “average” of the two variables X and Y. Any point A(x,y) of your sample is either above or below the average, meaning the trend line.
Very often, either X or Y have a sens of “good” or “bad”, meaning for example that being above average is “good” and being below is “bad”. The more far away from the average the point is, the better / worse it is.
It is very difficult by looking at a scatter plot with a trend line to see which point is the “best” (greatest distance above average) and the “worst” (greatest distance below average). Choosing a PCP instead of a scatter plot, using the trend line as transformation function, makes all his much clearer as the slope displays the distance from average. The points which are on the trend line or very close to it (average, neither good or bad) are straight lines in the PCP. The ones above the average go up, the ones below go down. Down/bad and up/good are natural associations. Those lines can be coloured differently too as shown in the image at the top of this article.
So for example if your trend line gives you y(x) = 7*x-9.33, the value x=0 needs to be aligned with y(0)=-9.33 and x=5 with y(5)=25.67.
Switching the X and Y axis is also possible, depending if being above average is “good” or “bad” as this will change the slope direction.
To conclude, I fell in love with parallel coordinate charts and use them now to display aviation safety markings for States, using traffic and audit results as X and Y axis. The graphs have huge success and are easy to interpret, which is the most important. No actually they are beautiful, that’s the most important!