Unlocking the insights with Sankey Diagram in Tableau

Sayali Phutane
6 min readFeb 21, 2024

--

Sankey diagrams are a powerful tool for understanding the flow and relationship within datasets. With tableau, we can harness the full potential of Sankey diagrams to uncover valuable insights and tell compelling data stories. Here is a brief tutorial about creating a Sankey diagram in tableau.

What is Sankey diagram?

Sankey diagram is a type of diagram that visualizes flow of data. The entity from/to where data flows is referred to as a node — the node where the flow originates is the source node (e.g. West on the left-hand side) and where the flow ends is the target node (e.g. Home Office on the right-hand side). The source and target nodes are often represented as rectangles / circles. The flow itself is represented by a curved path called the link. The width of the flow/link is proportional to the amount/quantity of flow. These diagrams are particularly effective for illustrating distribution of material, cost or energy flow.

History

Sankey diagrams are named after Irish captain Matthew Henry Phineas Riall Sankey, who used this diagram in 1889 in a classic figure showing the energy efficiency of a steam engine. One of the most famous Sankey Diagrams is Charles Minards’s Map of Napolean’s Russian campaign of 1812. Sankey diagrams are often used in field of physics to represent energy inputs, useful outputs, and wasted outputs.

By M. H. Sankey — Minutes of Proceedings of The Institution of Civil Engineers. Vol. CXXXIV, Session 1897–98. Part IV, Public Domain, https://commons.wikimedia.org/w/index.php?curid=2734254

Creating a Sankey Diagram in Tableau.

The dataset used for creating this Sankey diagram is the famous Sample Superstore dataset. The Sankey Diagram here shows sales for each segment and region. Left side nodes of the diagram pertain to the “Region” field and right side nodes show “Segment” field. To start with the chart, in tableau, open the superstore excel file. Then drag the “Orders’’ table to the right side space as shown below.

Figure 1 : Open Dataset
  1. Create union of Orders Table with itself. This is required because we need one copy of data to populate “Region” nodes on the left side and another copy to populate “Segment” nodes on the right side of the chart.
  2. Next we have to create a few calculated fields for creating this visualization. We will use data densification or padding. Create a calculated field to_Pad as follows
Figure 2: Calculated Field to_Pad

Here we are creating value 1 for the first copy of the Orders table and value 49 for second copy of the Orders table. We will populate “Region” nodes from first copy and “Segment” nodes from second copy. The number 49 is chosen arbitrarily to draw the curves of the chart smoothly.

3. Next we create a bin for “to_Pad” calculated field as shown below

Figure 3 : Bin for to_Pad

4. Create a calculated field “Index” as follows. This calculation will act as an offset that will evenly space all of the marks across the view.

Figure 4 : Calculated Field Index

5. Next step we create 2 calculated fields “Rank 1” and “Rank 2” as follows. These fields will calculate percent of “Sales” across the entire data set. Each of these fields will be used for Sales flow between nodes at either end of the chart.

Figure 5 : Calculated Field Rank 1
Figure 6: Calculated Field Rank 2

6. Next step is to use a “Sigmoid function”, to help draw the curves between the nodes. Sigmoid function is a mathematical function that generates S-shaped curves. The calculated field “Sigmoid”, in tableau, will be as follows

Figure 7 : Calculated Field Sigmoid

7. The calculated field ”Curve” will be used to actually generate the curves for our visualization.

Figure 8 : The calculated Field Curve

8. After creating the calculated field lets start building our Sankey diagram. On the sheet, drag the calculated field “Index” into columns and the calculated field “Curve” into rows, as shown below

Figure 9 : Sankey diagram : Step 1

9. To make the chart appear, first drag the fields “Region”, “Segment” and “ Padding” under details then edit the table calculations for “Curve” as shown below

Figure 10 : Sankey diagram : Step 2
Figure 11 : Sankey diagram : Step 3
Figure 12 : Sankey diagram : Step 5

10. Right click on “Index” in rows and choose “Compute Using” option and choose “Padding”, as shown below

Figure 13 : Sankey diagram : Step 6

At this stage our Sankey diagram starts taking shape. A few more formatting steps to create the Sankey diagram.

11. Select “Line” chart in the marks card.

Figure 14 : Sankey diagram : Step 7

12. Drag “Padding” to the path.

Figure 15 : Sankey diagram : Step 8

13. To define the width of the curves in the chart, create a calculated field.

Figure 16 : Calculated Field CurveSizing

14. Drag the “CurveSizing” field under Size and then select the option “Calculate using” and the “Padding” option, as shown below

Figure 17 : Sankey Diagram : Step 9

15. As a last step, move the “Region” field from details section to color section. A few basic formatting of the sheet will create beautiful Sankey diagram, as follows

Figure 19 : Sankey Diagram :Final Step

P.S. To show the nodes in the final presentation of Sankey Diagram, bar charts are created for each, “Region” on left side and “Segment” on right side.

Limitations of the Sankey Diagram.

Sankey diagrams use weighted networks. They are best used when we want many to many mapping between two domains. However these diagrams may not be useful

  1. When datasets are larger making them overly complex and hard to comprehend. Over-cluttering can make this diagram unreadable.
  2. When similar valued flows need to be compared. These diagrams can make it difficult to differentiate and compare flows if they are the same width.

Final thoughts

Sankey diagrams offer a unique and powerful way to visualize complex data flows and relationships. With Tableau’s intuitive features, we can easily create and customize Sankey diagrams. I am sure this blog will encourage you to visualize your data in a new way!!

--

--