Go With the Flow — A Sankey Story

Sankey diagram was named after Irish engineer Matthew Henry Phineas Riall Sankey. In 1898, he used this style to demonstrate energy efficiency of a steam engine.

This diagram is typically used to show a flow through a process or a system. The thickness of the chords or arrows represent the amount of resource transferred, while the lengths have no significance. This style is perfect for demonstrating dominant elements in a flow and is often quite easy to read.

Sankey with multiple layers

Due to the nature of the diagram, resource (chords) is the only numerical data required. This is also often considered to be the “inputs” and “outputs” of the system. The layout and the size of the bases depend on how much “flow” they get.

Common applications with multiple layers include; energy systems analysis, demonstrating cash or resource flow, path analytics, activity tracking, call center activity, healthcare timelines and communication analysis.

Another use for Sankey diagram is mapping the relations between two domains; A to B connection. This especially shines when comparing two big groups.

Sankey with two layers

When you think about it, you could basically use Sankey for tons of other applications too. For example, we found out that school fact books certainly look far better when using this style. Basically, the complete breakdown of a demographics can be displayed on one Sankey diagram.

Four pivots in one diagram

Above is a Sankey with 4 layers, showing 3 correlations between the 4. If you were to visualize this using traditional charts, you would have to put together 4 pie charts or 2 bar charts. Even then, you can’t correlate between the groups.

Looks fancy, but unfortunately donuts don’t tell us much

Looking at the Sankey diagram again, we can observe that even though, campus housing is used about 50–50 by the two groups, male students prefer off-campus housing more than female students do. Another thing the donuts can’t show is the fact that there are no Hispanic senior students in this school.

As can be seen on the example above, Sankey takes considerably less space, while still providing great insight and this is a major aspect that separates this diagram from conventional graphs. By now, you should have an idea about the merits and flaws of this diagram.

The number of groups in a parameter determines the number of chords and having too many of them can make things complicated. Some can be hard to read at first, but filtering and interactive features often help refine the visualization and allow you to focus on what you want.

When used with multiple layers, Sankey diagram can show the detailed relationship between two layers. However, this layout also reveals another disadvantage of Sankey; you can only compare two adjacent layers at a time. The rest of the flow is also there, but the amount transferred from a group in the first layer to the last is not. This becomes an issue with 3 or more layers. To overcome this, you can basically move the layers around and adjust the layout so that relevant correlations are displayed.

Here is the budget analysis for the City of San Francisco for the year 2014. The layer on the left represents the income sources of the city and on the right we have the sinks. In this example, almost all sources feed all expenditure targets, however we can also see the major contributors at a glance. For example, operating fund (mostly taxes) provides the most for all, but welfare and development attracts almost half the funds from continuing projects.

All in all, be it healthcare, engineering, finance or education; Sankey diagram can easily find use in a vast range of applications. What separates it from other visualizations is its ability to show correlations between several groups at a time and major contributors to each group become clear as daylight. It saves you space and has a catchy look. Since it’s quite easy to read, your audience certainly don’t have to be data experts.