ONDA: Plotly Dash solution for interactive organisational knowledge network discovery

syrom
25 min readJan 15, 2022

--

Here it is, my first Medium article. So be kind — I’m still learning.

Use case & motivation

First of all: what’s this article about — and why ?

Most folks living a corporate professional life will be familiar with this situation: you acquire a lot of implicit organisational knowledge over time. But it takes time to discover who’s REALLY doing what — or what the reason is that person x consistently returns more reliable answers than person y to your questions.

This fact becomes painfully obvious e.g. when onboarding new team members. Of course all newcomers should be given the due time to adept — but wouldn’t it be useful to shorten what “due” means by moving more aspects of the organisational knowledge from the implicit to the explicit domain ?

That’s the use case I wanted to address. There may be already be professional tools out there that provide solutions for this (…and hints in the comments are welcome!). But from my experience, organisational support for newcomers is often limited to be handed a classical org chart, representing the hierarchy: “That’s you, there, in this box — and those are your colleagues and this is your boss and your bosses boss”. End of story — and start of the time consuming process of acquiring the necessary implicit knowledge.

From my perspective, the two most very important dimensions of the organisational “knowledge graph” or “knowledge network” which are often relegated to the implicit domain, but are comperatively easy to model, are:

  • skills / competences
  • projects

Disclaimer: this choice of additional categories, besides the hierarchy, is purely pragmatic — but the technical solution explained hereafter is agnostic to both the kind of dimension or their number, just in case you see things differently.

There are many very good articles out there explaining graph fundamental or how to build interactive dashboards with Plotly Dash (see links later in the article).

The USP of this article is to wrap the subjects “graph visualisation” and “interactive dashboard” in one ready-to-run technical solution for a concrete use-case: providing an online dashboard that allows to intuitively explore the knowledge network of an organization, as represented thru hierarchy, projects and competences.

Graphical representation: the tool selection

So the question was how to best join the two dimensions “project” and “competence” with the “boiler plate org chart” to help provide a more holistic picture of the organisation and shorten the need for implicit learning.

The result should look something like this:

The use case: go from orgchart to organizational knowledge graph

When thinking about visualising different relationships, you end up quickly with graphs, as the earlier use of the word “knowledge graph” already suggested. A hierarchy is only a specific sub-form of a graph. And nothing prevents you from adding additional information or layers of information to a graph, as e.g. “Skill” and “Project” relationships in the simplified example.

So if I wanted a solution for the problem described, looking into graphs and their visualization seemed a good choice. After a bit of research, the choice fell finally on Plotly Dash. There are more packages and solutions out there for graphs and graph visualization. But none other matched all criteria that were important to me given the use case:

Plotly Dash advantage

There were solutions that were more visually appealing, but most fell out on the double criterium of being able to easily publish the result online (e.g. on the company intranet) and to interactively change and explore this “knowledge graph”.

Two honorable mentions, though: (1) Gephi as THE open source graph software. Definitely worth looking into — but according to my research too cumbersome for making results acessible online and (2) PowerBI from Microsoft (which fell out due to another criterium not expressly mentioned, NOT being open source), which has interesting high-level plugins to visualize graphs.

The strong point of Plotly Dash for me is the combination of visually appealing results that can be published online and be dynamically manipulated. Precisely what you need if you aim to motivate an intuitive and dynamic disovery journey thru the company’s knowledge graph.

This being said, there is also a major downside to the Plotly Dash approach: as of now, there is no high-level support for graph visualization in Plotly. By this, I understand the possibility to feed a data frame as argument to the visualization package — and the package taking care (more or less) of the rest.

But Plotly has a rather low level “scatter trace”-object which allows you to build visualisations from dots and lines. This object is, no surprise, e.g. the basis for line charts and scatter plots. So you need to build the graph representation from the bottom up, feeding the nodes of the graph as dots and the edges as line information. Not very user friendly… but doable. Plus: (halfway) understanding how this works gives you a good understanding of the inner workings of the entire Plotly package.

If you should happen to know any other solution for the use case described, please use the comments section: feedback is highly appreciated, as I did not find too many obvious solutions during my research.

Technical execution

As usual: good ideas are nothing without execution. So now for the technical part. This will take you thru the underlying data logic, required for the Python code to work, and thru the code itself with the following steps:

  • Data structure, logic and sources
  • Required packages
  • Function definition: creating the graph object
  • — The arguments passed to the funtion
  • — The business logic: filtering what the dashboard plot shows
  • — Build the graph
  • — Build the Plotly scatter trace object from nodes and edges
  • — Build the actual plot / figure based on the Scatter Trace object
  • The main program
  • — Define the dashboard layout
  • — Define the interactivity: callback functions
  • — Run the dashboard

The result is a ready-to-run dashboard for which the code is also availabe on GitHub.

Before going into the details a special thanks to:

These two articles contributed a lot to take the use case from an idea to an actual project (and also inspired the code).

Data structure, logic and sources

The nodes:

As explained in the motivation for this project, I opted for 3 different organisational layers to be visualized:

  • ‘H’ — Hierarchy
  • ‘P’ — Projects
  • ‘C’ — Competences / Skills

The obvious starting point: any form of HR database or orgchart that describes the company hierarchy and thus delivers the “H-nodes”, the backbone of the graph. Thus, the H-nodes represent individual employees in the company hierarchy.

The P-nodes represent projects that different individuals from the hierarchy can be related to. This info would probably need to be collected also from HR or in the field.

The C-nodes represent competences / skills of individual exmployees and that we want to track in the graph. This info can come from HR, can be collected in the field or can be collected e.g. thru a self assessement.

But how to actually model this data collection ? All nodes not only have a unique identifier (the name) but also a description. There is an important distinction here: H-nodes describe the entity the individual belongs to (e.g. the departement name) whereas P and C-nodes simply describe themselves (aka: they repeat the node name). This distinction is important as the later code will use the node description to filter the elements actually shown in the graph visualization according to the selected node descriptions in the interactive dashboard elements.

So a simplified example for the nodes info would look like this:

TYPE | NODE | NODE_DESC
— — -| — — — | — — — — — —
H | name_1 | dptmt_1a
H | name_2 | dptmt_1a
H | name_3 | dptmt_1a
H | name_4 | dptmt_1b
H | name_5 | dptmt_1b
H | name_6 | dptmt_1
P | project1 | project1
P | project2 | project2
C | skill_1 | skill_1
C | skill_2 | skill_2
C | skill_3 | skill_3
C | skill_4 | skill_4

The edges:

The edges table contains the info if and how the nodes are related. This information should ideally be collect together with the node information from the same sources: litterally, it comes down to “collecting and connecting the dots”, with the dots being the nodes and the edges being the connections.

So e.g. if edges go from employee to employee, this means that there is a direct hierarchical relationship between them. If an edge goes from an H-node (employee) to a P-Node (project), it means that this employee works on this project. Likewise, if an edge goes from an H-node (employee) to a C-node (competence), it means that the employee has this particular competence.

But edges can also lead from P-node to P-node or C-node to C-node: this can be used to build a project or competence hierarchy. A good example would be a 2-level hierarchy for computer language competences, with employees being connected to nodes e.g. for Python, Java, Julia or Rust…. and these competence nodes being connected to a new C-node “Computer languages”.

As edges have a clear starting and end point, these are often denoted as source and target of the edge. Overall, a simplified edge table would look like this:

TYPE | SOURCE | TARGET
— — -| — — — — — — | — — — — — —
H | name_1 | name_6
H | name_2 | name_1
H | name_3 | name_1
H | name_4 | name_6
H | name_5 | name_4
H | name_6 | name_7
P | name_2 | project1
P | name_4 | project1
P | name_6 | project1
P | name_6 | project2
C | skill_1 | skill_3
C | skill_2 | skill_3
C | name_1 | skill_1
C | name_2 | skill_1
C | name_4 | skill_2
C | name_5 | skill_2
C | name_6 | skill_4

This means e.g.: All employees report to name_6 as obviously the boss of dptmt_1, either directly or indirectly. name_4, name_2 and name_6 participate in project_1, but only name_6 in project_2. The skills skill_1 and skill_2 are related, as both are under a common “upper skill” skill_3 — and two employees from dptmt_1a have skill_1 while two employees from dptmt_2 have skill_2.

The attributes

The nodes and edge tables can hold additional information, describing further attributes of the nodes and edges. It will depend on the use case if it makes sense to collect more attributes and thus more information about the nature of nodes and edges.

To give some concrete examples:

  • Nodes: interesting attributes for H-nodes could be information like
  • — Seniority: in years
  • — Employment status: full-time, half-time, seasonal…
  • — others…..
  • Edges: quantitative attributes can help to further enhance the understanding of the given relationship, e.g.:
  • — Ranking 1–10 for skills, quantifying the competence level for the specific skill
  • — Ranking 1–10 for projects: assessing the involvement of the individual in the project or the importance of the individual for the project.

Attributes can provide valuable input to make the graphical representation clearer. E.g. a higher quantified skill level can be used to draw a thicker line, making different skills levels easily identifiable.

Required packages and data import

from jupyter_dash import JupyterDash
import dash
import dash_bootstrap_components as dbc
import dash_core_components as dcc
import dash_html_components as html
import networkx as nx
import plotly.graph_objs as go
import pandas as pd
from colour import Color

df_nodes = pd.read_csv(r'nodes.csv', sep=';')
df_edges = pd.read_csv(r'edges.csv', sep=';')

Now finally for the code: first, we import all the necessary packages and the data. Please refer to the linked documenation for details on DASH’s core, HTML and bootstrap components. Don’t worry: when you see the example for the general DASH layout approach later, it will become much clearer that these components are nothing but Lego-like building blocks that you use to assemble your dashboard.

The data is imported from two csv-files with, at their core, the data structure previously described. The actual data in the github repository has some more elements (see “The attributes”) — but this is ulitmately a design choice how much and where you want to go more into detail with your knowledge graph.

Function definition: creating the graph plot object

This will be the most code. This one function serves to prepare all the data for the main plot in the dashboard: the part of the knowledge graph to be visualized according to the settings of the interactive filter elements in the dashboard and the business logic.

The arguments passed to the funtion

def f_define_plot(nodes_data, edges_data, cl_node_description, cl_node_org, cl_node_type):

5 values are passed to this function:

  • The 2 complete data frames with the info on all nodes and edges of the knowledge graph; the reason is that only within this function, the elements corresponding the the interactive filter settings are identified and passed into the plot.
  • — nodes_data -> the entire imported nodes information
  • — edges_data -> the entire imported edges information
  • 3 filter setting as defined in the dashboard layout lateron in the code
  • — cl_node_description: as you may recall from above, this is either the department individuals work for — or only the repeated node name in case of nodes with the type “project” or “competence”
  • — cl_node_type: the selection of the node types the user wants to see in the graph. E.g. he/she will only want to see the project and the hierarchy-level mixed… or maybe both
  • — cl_node_org: this is, in the given case, a forward-looking feature.

For the time being, the example only comprises, on the department level, departments in one and the same organisational unit (aka: legal entity). If the graph is to be extended to cover the structure of several entities, those can can selected via this feature. Think e.g. in terms of several sales units in different countries with a similar setup (aka: Marketing UK, Marketing France, Marketing Germany…). As all nodes of the given example belong to the same unit, there’s no actual use for this element right now — but it allows to easily extend the graph to a wider scope.

The 3 filter settings are lists containing the details from the interactive dashboard elements, hence the naming “cl_xxx” for “criteria list”.

The business logic: filtering what the the dashboard plot shows

The biggest chunk of the plot definition function: here, the nodes and edges are filtered out so that only those that should be visualized according to filter settings and business logic are passed forward to the building of the actual plot.

This is the part where you can intervene by changing the business logic. E.g. I made the concious choice that, when displaying hierarchy-level-info, the next higher level will automatically be included (so as to make clear how every selected individual is connected to the next higher level).

Or, e.g. ALL hierarchy-layer nodes are shown that are connected to selected projects — not only those from departments that are currently actively selected. The rational behind this is the fact that projects may primarily take their resources from 1 or 2 departments — but it is also important to know if the project is not completely siloed … and who these “silo-breakers” are.

For the implemented logic to work as intended, the data MUST adhere to the defintions of the data structure as laid out above. A different data structure can, of course, be chosen. But mind that this will impact the logic in this part of the code, which would need to be modified accordingly.

The code works a lot with sets and set functions — which come in very handy in the graph context.

Finally: the code is far from being optmized: it’s more the result from trial & error and optimizing the logic is on the agenda for some time later. So should you be confused: it’s not your fault…. the implementation of the logic may still be confusing.

For the rest of this block, I’ll leave it to the comments in the code itself:

# BEFORE FIRST QUERY !!!! Otherwise problem thru C and P values in org-field:
# Criterium List must be extended by acronyms for comptence and project as
# they CAN potentially be included in the node_type criterium list and
# would erroneously be filtered out again via the org field.
cl_node_org_extended = cl_node_org + ['C', 'P']

# add 'H' (Hierarchy) in any case to the list of selected node types; otherwise,
# pure setting to Project or Competence do not trace back to the individuals
# represented in the H-entries
if 'H' not in cl_node_type:
cl_node_type.append('H')
# define query string
s_query = 'NODE_TYPE in @cl_node_type and NODE_DESC in
@cl_node_description and NODE_ORG in @cl_node_org_extended'
df_nodes_filtered = df_nodes.query(s_query)

l_nodes_aux1 = df_nodes_filtered.NODES.to_list()

# Creating a set of edges where either SOURCE or TARGET node is equal to filtered nodes:
# CAUTION: This set can still contain nodes that GO BEYOND THE FILTER CRITERIA
# defined in the node types list as ANY edge leading to one of the filtered nodes
# qualifies a non-filtered node to enter the set
aux_set1 = set()
for index in range(0,len(df_edges)):
if (df_edges['SOURCE'][index] in l_nodes_aux1 or df_edges['TARGET'][index] in l_nodes_aux1):
aux_set1.add(df_edges['SOURCE'][index])
aux_set1.add(df_edges['TARGET'][index])

# Determine which edges and nodes qualifiy to be represented in the graph according
# to the settings. Cleaning the set from all nodes that have a node type or belong
# to an organisation other than those defined in the node type and node orga filter list:
# Objective: create a list with all nodes in the aux_set1 which have node class
# or ORG that is NOT in the criterium list
l_aux = []
for i in aux_set1:
if df_nodes.query('NODES == @i').empty:
continue
else:
if df_nodes.query('NODES == @i').iloc[0].NODE_DESC not in cl_node_description:
l_aux.append(i)
if df_nodes.query('NODES == @i').iloc[0].NODE_ORG not in cl_node_org_extended:
l_aux.append(i)

# cycle thru the list and remove these "unwanted" nodes from the set
for i in l_aux:
aux_set1.remove(i)
# aux_set1 now only contains nodes as definded by the filter criterium lists

# Define a dfs with all edges and nodes that are to be visualized
# The difference vs. the "filtered" list is that this df also contains nodes
# that the filtered nodes connect to but which should normally be filtered out.
# Edges are more complicated, as they may contain unwanted elements, either
# as SOURCE or TARGET
df_edges_graph =df_edges[(df_edges.SOURCE.isin(aux_set1) | df_edges.TARGET.isin(aux_set1)) & df_edges.TYPE.isin(cl_node_type)]
aux_set_source = {nodes for nodes in df_edges_graph.SOURCE}
aux_set_target = {nodes for nodes in df_edges_graph.TARGET}

# combinining the two sets, containing all nodes that were in the filtered df_edge_graph
aux_set2 = aux_set_source | aux_set_target

# create the difference between the two sets: result shows again potentially
# "unwanted" nodes
aux_set3 = aux_set2 - aux_set1

# finally, we must differentiate: nodes on the H-Level should be included
# (showing persons that other persons corresponding to the filter criteria
# are connected to. Those, we do want to see - but not Projects or Competences
# that are not explicitly
# mentioned in the filter criteria
# Determining which nodes from aux_set3 are NOT Hierarchy elements:
l_aux = []
for i in aux_set3:
if df_nodes.query('NODES == @i').empty:
continue
else:
if df_nodes.query('NODES == @i').iloc[0].NODE_TYPE == 'H':
l_aux.append(i)
# cycle thru the list and remove these "unwanted" nodes from the set
for i in l_aux:
aux_set3.remove(i)

# using aux_set3 now to remove any row from the edges-df which contains the
# "unwanted" nodes, either as Source or as Target:
df_edges_graph = df_edges_graph[~df_edges_graph.SOURCE.isin(aux_set3)]
df_edges_graph = df_edges_graph[~df_edges_graph.TARGET.isin(aux_set3)]

# Now we still need to create the df for all nodes that should appear in the graph:
# As we have treated H and P/C-nodes differently if they have entered
# as source or target of one of the selected nodes, we also must update
# the set used to filter the node-df: aux_set2 may contain more
# H-nodes, e.g. additional sources or targets.
aux_set4 = aux_set2 - aux_set3
df_nodes_graph = df_nodes[df_nodes.NODES.isin(aux_set4)]

Build the graph

The result of the previous step are two data frames, df_nodes_graph and df_edges_graph, which contain the information of the graph resulting from the combination of interactive filter selection and business logic. Those are the nodes and edges to be visualized in the plot of the dashboard.

These two data frames are passed, in a first step, to the networkX package to build the actual graph object.

# The network graph (G-object) is initially build with the FILTERED edge and node
# information from the previous step
#######################################################################################
G = nx.from_pandas_edgelist(df_edges_graph, 'SOURCE', 'TARGET', ['SOURCE','TARGET', 'VALUE', 'TYPE', 'DPMT'], create_using=nx.MultiDiGraph())
#######################################################################################

# Workaround to the indexing problem with the NODES-column: copy the NODES name
# information to a column with a new name, allowing to serve the original column as index
df_nodes_graph['aux_name'] = df_nodes_graph['NODES']

# setting several node attributes (to be used as hovertext-info when hovering the mouse over the node in the graph)
nx.set_node_attributes(G, df_nodes_graph.set_index('NODES')['aux_name'].to_dict(), 'NODE_NAME')
nx.set_node_attributes(G, df_nodes_graph.set_index('NODES')['NODE_TYPE'].to_dict(), 'NODE_TYPE')
nx.set_node_attributes(G, df_nodes_graph.set_index('NODES')['NODE_DESC'].to_dict(), 'NODE_DESC')
nx.set_node_attributes(G, df_nodes_graph.set_index('NODES')['NODE_ORG'].to_dict(), 'NODE_ORG')

# Determine 'look' of the graph
################################
pos = nx.layout.spring_layout(G)
################################

# feed positioning info derived from the layout-method to the graph nodes
for node in G.nodes:
G.nodes[node]['pos'] = list(pos[node])

This part of the code determines especially:

  • The edge attributes that can be referenced when building the plot: e.g. VALUE, TYPE, DPMT can be used to determine color, line thickness etc. in the plot. The data must, of course, be present in the initial import.
  • The look of the graph: there is no unique way to visualise a graph… it can be visualized in endless different ways. Chosen here was the “spring-layout”, which is the usual default and yields “intuitively correct” plots because it tries to group adjacent nodes together. But this part of the code is a perfect point for intervention if you want to get completely different-looking results by changing only a single line in the code. If you want to try it out: the supported layout options are explained here.

Build the Plotly scatter trace object from nodes and edges

The data from the graph object is now recoded to serve as input for the Plotly Scatter Trace object (…the low-level approach mentionde above).

The code was mostly copy&pasted from Medium articles and stack overflow, as I am unfamliar with the arcane details of how to build this object. But obviously there are two major parts: creating the trace for the nodes first an then for the edges connecting these nodes.

As you can also see in the code, more additional details go into the nodes, e.g. 3 different colors for the different node types / graph layers H (Hierarchy), P (Project) and C (Competence). And this part of the code also defines how graph attributes are used to serve as text input for hover text (on nodes only so far). In other words: here, you determine which additional information shows on the screen when you hover your mouse over a given node in the dashboard plot.

When using the graph object to create the edges-part of the ScatterTrace object, the weight of the trace (aka thickness of the line) is determined by the VALUE attribute of the graph. In other words: this is a concrete implementation of the idea to e.g. visualize the skill level of a person connected to a certain skill via a VALUE between 1 and 10 in the edges import data, with thicker lines indicating a higher skill level.

traceRecode = []

### Setting up the the basic structure of the Scatter object
### The initially empty lists from the initialization will be successively filled !!!
node_trace = go.Scatter(x=[], y=[], hovertext=[], text=[], mode='markers+text', textposition="middle center", hoverinfo="text", marker={'size': 30})
index = 0
for node in G.nodes():
x, y = G.nodes[node]['pos']
hovertext = "NODE_NAME: " + str(G.nodes[node]['NODE_NAME']) + "<br>" + "NODE_DESC: " + str(G.nodes[node]['NODE_DESC']
) + "<br>" + "NODE_TYPE: " + str(G.nodes[node]['NODE_TYPE']) + "<br>" + "NODE_ORG: " + str(G.nodes[node]['NODE_ORG'])
text = G.nodes[node]['NODE_NAME']
node_trace['x'] += tuple([x])
node_trace['y'] += tuple([y])
node_trace['hovertext'] += tuple([hovertext])
node_trace['text'] += tuple([text])
index = index + 1
### appending color info for nodes
### Probably 10.000 ways to do this more intelligently, e.g. via
### a) integrating this step in the loop above or
### b) using a predefined dictionary to map colors (with "red" as default value)
l_node_color = []
for node in G.nodes():
if G.nodes[node]['NODE_TYPE'] == "H":
node_color = "grey"
elif G.nodes[node]['NODE_TYPE'] == "P":
node_color = "blue"
elif G.nodes[node]['NODE_TYPE'] == "C":
node_color = "green"
else:
node_color = "red"
l_node_color.append(node_color)
node_trace.marker.color = l_node_color

traceRecode.append(node_trace)

## Define "trace" of edges for the plotly object
colors = ['black']
index = 0
for edge in G.edges:
x0, y0 = G.nodes[edge[0]]['pos']
x1, y1 = G.nodes[edge[1]]['pos']
weight = float(G.edges[edge]['VALUE']) # Could also be e.g. "LEVEL"
trace = go.Scatter(x=tuple([x0, x1, None]), y=tuple([y0, y1, None]),
mode='lines',
line={'width': weight},
marker=dict(color=colors),
line_shape='spline',
opacity=1)
traceRecode.append(trace)
index = index + 1

Build the actual plot / figure based on the Scatter Trace object

Using the graph info recoded to a Scatter Trace object to build the actual Plotly figure — and thus the object to be returned from the function call.

figure = go.Figure(
data = traceRecode,
layout = go.Layout(showlegend=False, hovermode='closest',
margin={'b': 4, 'l': 4, 'r': 4, 't': 4},
xaxis={'showgrid': False, 'zeroline': False, 'showticklabels': False},
yaxis={'showgrid': False, 'zeroline': False, 'showticklabels': False},
height=800,
clickmode='event+select'
))
return figure

The main program

The main program is surprisingly simple when compared with the plot function — but it still contains quite a number of moving parts that need to be geared up correctly to make everything work.

Define the dashboard layout

The first step is the initialization of the dashboard parameters: these initialization values determine what graph is shown when the dashboard is initially run. The choice here is to initially show:

  • only Hierarchy nodes (node_type = ‘H’)
  • only from the department ‘H_GA’: you may recall from earlier that the node description carries the department information for the H-type nodes
  • all nodes on the ORG-level: you may also recall that this is a forward-looking implementation; currently, all nodes in the example data belong to the same organisational unit / legal entity. If this changes, this part of the code could be adjusted to only show one particular organisational unit initially.
l_initial_node_type = ['H']
l_initial_node_desc = ['H_GA']
l_initial_org = df_nodes['NODE_ORG'].unique().tolist()
l_initial_org.remove('C') # Competences must be removed from ORG list
l_initial_org.remove('P') # Projects must be removed from ORG list

To better understand the following code, first a sketch that tries to summarize the basics of Plotly Dash dashboards (…but you should consult other articles that go more in depth): it shows how Dash is basically a Lego-like plug&play system in which you first define a grid of rows and columns (black boxes with red markings) and then chose the components (green boxes) that go into the grid compartments.

Basic Plotly Dash dashboard design approach

These components can be interactive elements like the radio buttons or dropdown lists in row1 which serve, via the callback function explained later, to update plot components like the bar chart or scatter plot in row 2 …. or basically any other kind of HTML-code that you fill them with (row3).

Now for the concrete code: the Jupyter Dash app is initialized, directly with an external style sheet for a consistent look and the body of the HTML-page defined that holds the different dashboard elements:

  • The dashboard title
  • The interactive filter & selection elements determining which part of the graph to show
  • — Labelled checklist to select the shown layer type: H, C, P
  • — Dropdown-List to select the organisational unit(s) to show; forward looking implementation and with no effect until nodes are imported from more than one organisational unit
  • — Dropdown-List to select node description. These denote the corresponding department for H-level-nodes (aka: employees) or simply the skills and projects that should be displayed
  • The plot or figure itself that shows the knowledge graph for the active selection (and business logic)

Mind the “id”-parameter set for each interactive component in the actual code below !

Having a unique ID per component allows this parameter to function as the join that lets the callback function define the flow of data between the different dashboard components !

Also mind how the actual graph plot (id=”ONDA”) calls in a first step the function defined above, passing the 5 function arguments with their initialization values. The secret of the callback function later in the code lies in the fact that it re-runs the function with new arguments as soon as a change takes place in any of the inputs controlled via the interactive components described above.

app = JupyterDash(__name__, external_stylesheets=[dbc.themes.LUX])
# easy to change overall look - just try out other themes

body = html.Div([
html.H1("Visualization of orga xyz knowledge graph")

# Checklist to select graph layers to include in the plot
, dbc.Row([dbc.Col(html.Div([dbc.Label("Displayed the following Node Types"),dbc.Checklist(options=[{"label": "Hierarchy", "value": "H" },
{"label": "Projects", "value": "P"},{"label": "Competences", "value": "C"}], value=["H"], id="f_select_node_type", inline=True, switch=True,),]),width=4)

# Dropdown List to select which organisational units to inlude in the graph
, dbc.Col(html.Div([dbc.Label("Display the Nodes for the following organizational units: "),
dcc.Dropdown(options=[{'label': i, 'value': i} for i in l_initial_org], value = l_initial_org, id = 'f_select_org',multi=True)]), width=4)])

# Progress bar "perverted" to horizontal divider / ruler
, dbc.Row(dbc.Col(html.Div(dbc.Progress(value=100, color="info"))))

# Dropdown List to select departments and/or skills and/or projects to include in the graph
, dbc.Row(dbc.Col(html.Div([dbc.Label("Display the following Node Classes"),
dcc.Dropdown(options=[{'label': i, 'value': i} for i in l_initial_node_desc], placeholder = 'Select from the Node Classes (options depending on selected Node Type(s)', value = l_initial_node_desc, id = 'f_select_node_class',multi=True)])))

# Progress bar "perverted" to horizontal divider / ruler
, dbc.Row(dbc.Col(html.Div(dbc.Progress(value=100, color="info"))))

########################################################################################
## the actual plot of the knowlede graph
########################################################################################
, dbc.Row(dbc.Col(html.Div(dcc.Graph(id="ONDA", figure=f_define_plot(df_nodes, df_edges, l_initial_node_type, l_initial_node_desc, l_initial_org)))))
########################################################################################
])

Here a screenshot of the running dashboard, showing the 3 interactive filter elements with their IDs and the actual graph: you can easily see, in the result, the underlying hierarchical department structure, with often several dots connected to a “lead”-dot.

Screenshot from running dashboard

Define the interactivity: callback functions

But how exactly is the udpating done ? Enters the callback-function:

Here, again, first the attempt of a visual abstraction, based on the earlier dashboard grid example:

CallBack function: the basic mechanics

This abstraction looks like this in actual code:

# pushing the change in the node_type selection to trigger
# a corresponding change in the node_class selection
# (only node classes shown whichbelong to the chosen network type(s))
@app.callback(
dash.dependencies.Output(component_id='f_select_node_class', component_property='options'),
[dash.dependencies.Input(component_id='f_select_node_type', component_property='value')])
def update_node_classes(f_select_node_type):
current_node_type = f_select_node_type
df_aux = df_nodes[df_nodes.NODE_TYPE.isin(current_node_type)]
return [{'label': i, 'value': i} for i in df_aux.NODE_DESC.unique()]

# callback for actual graph update
@app.callback(
dash.dependencies.Output('ONDA', 'figure'),
[dash.dependencies.Input('f_select_node_class', 'value'), dash.dependencies.Input('f_select_org', 'value')])
def update_output (f_class, f_org):
cl_node_description = f_class
cl_node_org = f_org
######## node-types to be displayed not dynamically updated from selector
######## switch, but derived from the current node-class selection !
aux_set2 = set()
for node_class in cl_node_description:
if df_nodes.query('NODE_DESC == @node_class').empty:
continue
else:
aux_set2.add(df_nodes.query('NODE_DESC == @node_class').iloc[0].NODE_TYPE)
cl_node_type = list(aux_set2)
########
return f_define_plot(df_nodes, df_edges, cl_node_description, cl_node_org, cl_node_type)

Overall, it is rather easy to identify the structure of the callback function as shown in the abstraction:

  • What is update ? -> ID in dash.dependencies.Output plus property to be updated
  • Based on what change ? (aka: from which other component with what value ?) -> ID in dash.dependencies.Input plus, most of the time, “value”, aka the current value of this element after the user input, as property
  • and how (aka: how is the update calculated from the changed input data) -> def_update_xxx-function after callback.

The callback that uses the Type-selector to filter the displayed node_classes (actually: the node_descriptions: my terminology was unfortunately inconsistent during coding at this point) is pretty straight forward.

The update function for the graph is a bit more complex, though: only 2 arguments go into the function but the return is a function call with 5 arguments (calling the plot function). The reason here is that two arguments never change (the complete edge and nodes-info passed to the plot function). But the plot function call cannot simply take the node types info set in the selector: the selector only updates the node_classes (aka descriptions) available in the dropdown list — and the node types that are feed into the function call must be dynamicaly generated from the current drop down list selection of node_classes — precisely what the loop in “def_update_output” does.

If all this seems a bit complex: it is and it takes a bit of getting used to. However, Jupyter Lab’s debug-mode has an excellent visualization how the callback components in the dashboard influence one another:

This shows very well, again by ID-name, which component feeds updating information into which other component… and if it works as expected (by the green circle).

Run the dashboard

Finally getting there ! Only remaining thing to do: update the Dash app object with the HMTL-layout defined above and run it (in this case as a dedicated tab within Jupyter Lab; in a productive environment, the app would need to be run on some internal server):

app.layout = html.Div([body])
app.run_server(mode="jupyterlab", debug = True)

And that’s all, Folks ! May the discovery journey deliver new insights !

Next steps

Well, only almost finished…. there are always next steps after a project. The ones I came up with after finishing this one were:

Potential challenges

Process ! This dashboard is an IT-solution. And IT-solutions have the tendency to fade into oblivion if not properly maintained. Which means in our case mainly a process to maintain the freshness of the data. It is self-evident that the dashboard will only create value for its users if it reflects the current state of affaires in the organisation.

So a processes and responsibilities must be defined to ensure the freshness and correctness of the data after the first collection — which can be quite a challenge against the backdrop of always limited resources and eventually other priorities.

Potential improvements

The need for freshness may also be an opportunity. E.g. the competence-layer of the graph is something that could lend itself to AI-automation: if content generated within the organization can

  • be connected to specific individuals and
  • be tagged as expressing competence in certain areas

the graph could be automatically generated from this content. Content meeting these requirements could be e.g. email repositories or articles on internal wikis or intranets.

On a more esthetic level, the design, layout and look of the dashboard can certainly still be improved. User feedback would need to be collected if the selection mechanism (codified in the dashboard layout and the callback functions) is sufficiently intuitive — or if the discovery process becomes more user friendly if some changes are implemented.

Finally, on the lower level of pure technicalities, the dashboard could still be improved e.g. via:

  • Middle Hover on edges: there was a very nice implementation of this in one Medium article. In the code above , only nodes contain additional hover info and an enrichement to display more info also on the relationship (edge) could be helpful
  • Dropdown list improvement: to my bewilderment, the DASH dropdown component does not seem to support features that are standard in other appications, e.g. Excel, PowerBI or Microstrategy. That is: the option to select or disselect ALL or NONE from the options list or to look for list entries via text matching (aka: option filtering). That would make for a useful feature request, I assume.
  • Form, color and size of graph elements: my choices clearly show that I am not a designer and the entire look can certainly be improved
  • Code optimization: the code is probably aweful in some parts and certainly not optimized. The highest priority certainly goes to the business logic and the use of further style parameters and attributes in the plot generation function

Resources and links

And finally: if you stayed with me until here, you have my full respect. And if you liked what you red, give me a thumb’s up. And if you have any comments, remarks or corrections to make: any comments are appreciated !

--

--

syrom

happy about my past, glad about the present and curious about the future.