CytOnda: using DASH Cytoscape for organizational network discovery

syrom
12 min readMay 8, 2022

--

Intro

This article is based on and an extension to the previous article “ONDA: Plotly Dash solution for interactive organisational knowledge network discovery”. In the previous arcticle, I showed how to use Plotly Dash in combination with networkx to create a graph that combines more levels of organizational information than just the “standard orgchart”, which merely shows the hierarchical dependencies between employees . In the given example, it allowed to additionally show and discover how skills and project participation are distributed within the network of employees — but any other dimensions besided “skills” and “projects” can be used as layers with this approach.
I already wrote in the article that the graphics part of the code was REALLY low level. A very low-level Plotly Scatterplot Object was assembled one by one, node by node and edge by edge, using the graph information from the networkx graph object..

What changed ?

After publishing the previous article, Dave Gibbon from Plotly contacted me (BIG THX for that !) on LinkedIn and made me aware that Plotly Dash provides a higher level possibility to create graph visualizations, namely Dash Cytoscape, an adaption to Dash of a graph visualization package initially developped to visualize (…as the name suggests!) genetic information. And he also made me aware that there’s a fairly easy way to publish the resulting Dashboard* so that it can be accesssed by anyone via Heroku. It took a while to find the time to rewrite the code, but this article finally shows you the result. What changed was basically this:

Conceptual changes vs. previous version

The new code skips one step in the transformation of the tabular data representing the nodes and edges of the organizational network to an intuitive graph visualization by using directly the capabilities of Cytoscape. And the code is now rewritten so that it suits being published via Heroku.

Was it worth it ?

The simple answer: YES
The slightly longer answer:
- A much better look of the resulting graph, mainly due to Cytoscape supporting the “breadthfirst” layout type; this layout produces results coming very close to the “intuitive expecation” learned from normal orgcharts.
- Better “discoverability” of the graph, as Cytoscape allows to interactively drag & rearrange nodes in the visualization it produces (via the “grabbable” option in the nodes dictionary explained further down below)
- Less code thanks to Cytoscape building the graph simply from a list of dictionaries representing the nodes and edges
- Easier customizability via the Cytoscape Stylesheet; e.g. the directional arrows, showing the direction of the relationship between nodes also make for better looks and better understandability.
- …plus I discovered the tapNodeData / tapEdgeData methods of Dash that allowed me to replace the mouse-over info from the previous ONDA-solution with a a clearer info window for nodes and edges, giving additional information on nodes or edges that the user clicks on

But pictures say more than a thousand words:

ONDA (dashboard from previous article):

ONDA dashboard

CytOnda (described in this article):

CytOnda Dashboard result

And, as mentioned above, you can discover the entire look&feel of this approach live on Heroku. Plus: at the end of this article, you’ll find a link to the code on GitHub (plus the sample data), such as to be able to reproduce everything you see here.

This result from CytOnda comes surprisingly and pleasantly close to the the abstract representation that I created to visualize the entire project’s objective before starting it:

ONDA project objective

OK, “Skills” were originally shown in orange instead of green…. but, hey, who’s perfect ?

What changed in detail ?

The function definition

Now for the actual codes changes versus the original ONDA-code. As explained in the previous article, most of the magic happens in a function call “f_define_elements” that takes in as arguments the entire gaph information in form of two data frame (all nodes, all edges) plus the 3 input components that conrol the user input (selection of nodes types that are available for selection, the nodes classes selected from these types (aka: concrete departments, projects, skills….) and finally the organisational entity).
The first part of the function was refered to, in the previous article, as business logic and remained unchanged.
The result are two data frames which now only contain those nodes and edges from the complete knowledge graph that correspond to the selection made in the interactive components and are thus those elements of the graph to be visualized:
- df_nodes_graph
- df_edges_graph

Cytoscape takes in as arguments a list of dictionaries, representing the nodes and edges it has to visualize. Transforming the two dataframes resulting from the business logic into the list of dictionaries that Cytoscape takes in as input is more concise than the previous approach and shortens the function quite a bit:

nodes = []
for d in df_nodes_graph.to_dict(orient=”records”):
# if else in order to handle ‘H’-nodes differently to display not only
# the node id, but also the node description aka the department name !
if d.get(“NODE_TYPE”) == “H”:
nodes.append({‘data’: {‘id’: d.get(“NODES”), ‘label’: d.get(“NODES”)+
‘ (‘ + d.get(“NODE_DESC”)+’)’, ‘grabbable’: True, ‘type’: d.get(“NODE_TYPE”),
‘description’:d.get(“NODE_DESC”)}})
else:
nodes.append({‘data’: {‘id’: d.get(“NODES”), ‘label’: d.get(“NODES”),
‘grabbable’:True, ‘type’: d.get(“NODE_TYPE”), ‘description’: d.get(“NODE_DESC”)}})

edges = []
for d in df_edges_graph.to_dict(orient=”records”):
edges.append({‘data’: {‘source’: d.get(“SOURCE”), ‘target’: d.get(“TARGET”),
weight’:d.get(“VALUE”),’type’: d.get(“TYPE”)}})

elements = nodes + edges

return elements

The details to note in this part of the code:
- df.to_dict(orient=’records’): the “orient=records” arguments returns a list of dictionary entries that the “for”-loop can iterate over
- The ”grabbable:True” entry in the nodes dictionary defines that the nodes can be later dragged by the user! An important detail that enhances the discoverability of the graph a lot by making the resulting graph more interactive.
- The if/else statement for nodes changes the label info recorded in the dictionary, depending if the node is a hierarchial node or not. The hierarchial nodes contain both the node ID AND the node desciption, denoting e.g. the employee name AND the department he/she belongs to. This make the graph easier to understand. For other node types, this differenciation does not exist, as node name and node description are the same.

The Cytoscape Stylesheet

The stylesheet was a very pleasant, allbeit “underdocumented” feature discovery in Cytoscape. This stylesheet allows to rule the look of the graph (e.g. shape of nodes, the text displayed besides nodes or edges, color and width of edges etc.) based on a combination of rules applied to the dictionary entries describing the graph.
Sounds abstract ? Well, take a look — and it becomes far less daunting:

l_stylesheet = [
# Selector for all nodes
{‘selector’: ‘node’, ‘style’: {‘content’: ‘data(label)’}},
# conditional selector only for nodes with certain type value
{ ‘selector’: ‘node[type=”H”]’,
‘style’: {‘shape’: ‘square’,
‘background-color’: ‘grey’}},
{ ‘selector’: ‘node[type=”P”]’,
‘style’: {‘shape’: ‘circle’,
‘background-color’: ‘blue’}},
{ ‘selector’: ‘node[type=”C”]’,
‘style’: {‘shape’: ‘star’,
‘background-color’: ‘green’}},
# Selector for all edges
{‘selector’: ‘edge’,
‘style’: {‘curve-style’: ‘bezier’, ‘width’: ‘data(weight)’,
‘target-arrow-shape’: ‘triangle-tee’}},
# Conditional selector only for edges of certain type
{‘selector’: ‘edge[type=”H”]’,
‘style’: {‘line-color’: ‘grey’}},
{‘selector’: ‘edge[type=”P”]’,
‘style’: {‘line-color’: ‘blue’}},
{‘selector’: ‘edge[type=”C”]’,
‘style’: {‘line-color’: ‘green’}}]

The stylesheet definition is a list of dictionaries, with the dictionary entries describing how the different elements of the Cytoscape graph should look like.

Two things, though, came as a surprise to me while coding, as they were not or not very clearly explained in the documentation:

1) Conditional styles
If nothing else is specified, the looks defined in the “style”-dictionary key apply to ALL elements chosen via the “selector” entry. E.g., according to the first line of the code block above, the content displayed besides ALL nodes is the value of the “label”-dictionary key in the Cytoscape data (…as a reminder: the graph info is fed to Cytoscape in form of a list of dictionaries, with one list for the nodes and every single node being a dictionary entry and one list with dictionary entries for every single edge to be displayed).
What I found out more by chance: when using a boolean condition for the “selector”-entry in the dictionary (e.g.: ’selector’: ‘node[type=”P”]’), one can adjust the selected element’s look only for those elements that correspond to the condition.
As you can see in the code block above, only H-type nodes (“hierarchy”) receive a square shape in grey color, C-type nodes (‘competences’) receive a star form in green color and P-type nodes (‘projects’) receive a circle form and blue color.
I found this a very efficient and straight forward way to quickly define the overall look of the graph (with only the “layout”-option of the overall graph call being even shorter and having more impact)

2) Arrows and weights for edges
If you look at the Selector-part for the edges in the stylesheet codeblock above, you see the following definition applied to ALL edges:
’style’: {‘curve-style’: ‘bezier’, ‘width’: ‘data(weight)’, ‘target-arrow-shape’: ‘triangle-tee’}}
This makes sure that
- the weight-key of the egdges dictionary list (aka the intensity of the relationship between the nodes that the edge connects) is used as a parameter for the physical line width in the graph
- an arrow is shown that shows the direction of the edge (which was not possible in the previous implementation)
I was very lucky not to miss the one line in the documentation that specified that the standard curve-style used as default by Cytoscape does NOT support this arrow-feature — and that the curve-style must be set to “bezier” in order for this detail to work.

The dashboard layout

The basics of the dashboard layout are the same as the ONDA-code in the previous article. A grid structure is defined that is populated with components that either serve as input (the parameters the user can set) or as output (the resulting graph and information on selected items):

Versus the previous version, there were two major changes.

The graph call

Within the body-definition, the actual Cytoscape component call looks like this:

################### graph component
, dbc.Row(dbc.Col(html.Div([cyto.Cytoscape(id=’ONDA’, layout={‘name’: ‘breadthfirst’},
# breadfirst layout ensures almost hierachical layout !!!!
style={‘width’: ‘100%’, ‘height’: ‘650px’},
elements=f_define_elements(df_nodes, df_edges, l_initial_node_desc, l_initial_org,
l_initial_node_type), stylesheet = l_stylesheet)])))
##############################

The details to note here are:
- cyto.Cytoscape: the actual call of the Cytoscape package with several arguments:
- id: the ID needed to make the interactivity between components work via “callbacks” explained below
- layout: the ”breadthfirst” layout option makes the resulting graph look “more rectengular” and “very orgchart-like”. As said in the previous article: if there is one option in the codes which changes the look and feel of the resulting graph completely by changing a single argument…. then this is it !
- style: the size / dimensionality of the graph within the dashboard
- elements: calls the “the BIG function” explained above that returns the data extracted from the overall knowledge graph, based on the interactive settings, that is to be visualized. The function needs to return the right data format for Cytoscape, that is a list of dictionaries.
- stylesheet: another list of dictionaries, as also described above, that rules the look of the graph elements

The new “info window” components

In the previouis version, additional information on the nodes and edges became visible when hovering the mouse over the element. This was replaced, in the current version, by two additional “info windows” that display either node or edge info relativ to the last element the user has clicked on.
So the body-defintion for the Dashboard HTML grid structure was extended with the following code, creating a heading and the space for these two windows:

, dbc.Row([dbc.Col(html.H5(‘Click on node for details’)), dbc.Col(html.H5(‘Click on edge for details’))])
# additional row to display tap info for node and edge mousover -> see corresponding callback
, dbc.Row([dbc.Col(html.Div(html.Pre(id=’node_tap’))), dbc.Col(html.Div(html.Pre(id=’edge_tap’)))])

Callbacks

Callbacks are the functions that define the relationship and thus the interactivity between the different components of the DASH dashboard.
Nothing changed in the callbacks defined for the components already used in the previous ONDA-approach. As a reminder, here again the abstraction of how a callback works by using the output (“property”) of one component (defined via its ID) to change another component (also defined via its ID) as defined per the function definition. Or, put simply in terms of the example below: how changing the value in the dropdown box changes the data displayed in the bar chart:

Callback function: schematic explanation

As there were the two new “information windows” added in the dashboard layout, two new callbacks needed to be defined that rule which additional information is displayed after clicking either on a node or an edge in the graph.
For the node-click info window, this makes (via the ‘tapNodeData’ dependency):

# callback to show tap info for node
@app.callback(dash.dependencies.Output(‘node_tap’, ‘children’),
[dash.dependencies.Input(‘ONDA’, ‘tapNodeData’)])
def displayTapNodeData(data):
d_node_info_display_json = json.loads(json.dumps(data))
l_node_info_display =[]
try:
for k, v in d_node_info_display_json.items():
if k in l_node_attribs4display:
l_node_info_display.append(k+”: “+v+new_line)
except:
l_node_info_display = “No node clicked yet”
return l_node_info_display

For the edge-click info window, this makes (via the ‘tapEdgeData’ dependency):

# callback to show tap info for edge
@app.callback(dash.dependencies.Output(‘edge_tap’, ‘children’),
[dash.dependencies.Input(‘ONDA’, ‘tapEdgeData’)])
def displayTapEdgeData(data):
d_edge_info_display_json = json.loads(json.dumps(data))
l_edge_info_display =[]
if isinstance(d_edge_info_display_json, type(None)):
l_edge_info_display = “No edge clicked yet”
else:
for k, v in d_edge_info_display_json.items():
if k in l_edge_attribs4display: # additional check as e.g. the weights are
# numeric and can thus not be appended to string
if type(v) != str:
l_edge_info_display.append(k+”: “+str(v)+new_line)
else:
l_edge_info_display.append(k+”: “+v+new_line)
return l_edge_info_display

l_edge_attribs4display and l_node_attribs4display are lists containing the keys from the dictionary chosen to be displayed in the information window:
l_node_attribs4display = [‘id’, ‘type’, ‘description’]
l_edge_attribs4display = [‘source’, ‘target’, ‘weight’]

This is a precaution as the dictionary for the nodes and edges of the graph can contain potentially a large number of additional attributes, risking to overload the info window.

As already mentioned in the previous ONDA-article, Jupyter Labs has a great debug feature, that shows the “interactivity flow” between the different components, as defined by the corresonding callback-functions that link these components. The following screenshot shows how the “tap”-info from the main “ONDA”-graph component feeds node_tap and edge_tap info to the two new info windows at the bottom of the screen:

CytOnda: callback debug view in Jupyter Lab

Publishing the result on Heroku

The code in the previous article was written to run under Jupyter Lab. After receiving the hint that publishing the live dashboard via Heroku is fairly easy, I was surprised to find out that the adjustments required to be applied to the productive code itself were minimal. It was sufficient to change the code in two places:

1) Instantiating the app-object

app = dash.Dash(__name__, title = “ONDA Organizational Network Discovery”, external_stylesheets=[dbc.themes.LUX], suppress_callback_exceptions=True)
server = app.server

to make the code run on Hiroku instead of:

app = JupyterDash(__name__, external_stylesheets=[dbc.themes.LUX])

when running in a Jupyter Lab tab.

2) Running the app:

For Heroku:

app.layout = html.Div([body])
if __name__ == “__main__”:
app.run_server(debug = True)

instead of:

app.layout = html.Div([body])
app.run_server(mode=”jupyterlab”, debug = True)

when running the app in a Jupyter Lab tab.

Step-by-Step setup for Heroku

But it was not all that easy: registering and setting everything up for the code to run on Heroku was a bit more challenging, as it requires to set up a dedicated envirionment plus Git and push the code file, the data and some additional info (e.g. “Procfile” and a “requirements.txt” file) to Heroku.

I used this very good step-by-step guide that I found courtesy to Google :
https://www.datagiraffe.com/data-visualisation/deploying-a-plotly-dash-app-on-heroku-conda-environment/

If it worked for me, I assume it can work for most other dashboard newbs ;-)

Perspectives

One last time refering to the previous ONDA-article: the CytOnda code only reads static sample data in tabular format from two files, with a data structure that is explained in detail in the previous article:
df_nodes = pd.read_csv(r’nodes.csv’, sep=’;’)
df_edges = pd.read_csv(r’edges.csv’, sep=’;’)

Probably the most challenging part for a real enterprise usage of this approach would be to automatically populate the company’s organisational knowledge graph from live systems (e.g. HR competence database or project and collaboration suits) instead of using this simple and static approach.

Sources:

Previous article : “ONDA: Plotly Dash solution for interactive organisational knowledge network discovery”
Complete Code to run the CytOnda dashboard on Heroku: https://github.com/syrom/CytOnda
Heroku-Link to live Dashboard: http://cytonda.herokuapp.com/

--

--

syrom

happy about my past, glad about the present and curious about the future.