A visual language for exploratory graph data analytics

Weidong Yang
Kineviz
5 min readNov 17, 2022

--

Compared with solutions based on traditional relational data models, solutions based on graph models — especially the LPG (Labeled Property Graph) — not only improves, but also makes it easy to solve previously very difficult problems. Connections made explicit through the graph data model empower our capabilities to drive visual exploratory data analysis. We can intuitively express entities as nodes or connections as relationships in the graph. However, simply representing raw graph data in a visual form most often results in a confusing mix of nodes and edges, or something that resembles a galaxy. It’s awe-striking but hard to gain insight. As a result, effective visual analytics on graph data are often limited from tens to a few hundreds of nodes.

To go beyond this limitation and unlock the full power of exploratory visual analytics of graph data, it’s important to utilize a set of tools that transforms and simplifies graph visualization for readability and comprehension. Code-based tools like cypher or JS are ill-positioned for such tasks. They are not accessible for the majority of analysts, making solutions with many steps hard to manage and communicate cross-functionally.

To unlock the effectiveness of visual explorative analytics of graph data, we need a comprehensive framework for processing and transforming the graph data visually. It needs to be intuitive, support a wide range of data transformations and no-code operations.

Additionally, the framework should satisfy these requirements:

  • Flexibility when it comes to data modeling and transformation
  • Lend itself to rapid reasoning and intuitive insights
  • Support inference, abstraction, and summarization
  • Have the ability to chain operations, that can be visualized and easily understood
  • Support parallel processing over graph data

We defined the following five important operators to fulfill the above requirements. Those operators can be chained and easily understood:

  1. Map
  2. Aggregate
  3. Extract
  4. Link
  5. Shortcut

Map

Similar to the Map function common in the functional language, Map takes in a set of data and maps it to a new set of data. In the context of LPG, Map is constrained to a set of nodes of the same category, and takes their properties then computes a new property that is assigned back to the appropriate node(s). The operation is performed node-by-node in parallel fashion.

The equivalent cypher would be:

MATCH (n:Category) SET n.new_prop = f(targeted_prop, props)

targeted_prop is a property the user can choose as input for a set of built in functions. props contains the full set of properties of the node, and new_prop stores the resulting value. The Map transform implemented in GraphXR provides a set of predefined functions; users can also write their own JS function.

Here is an example of mapping email to domain for Person nodes:

Aggregate

Aggregate is the graph version of reduce. It leverages the connectivity of the graph, essentially reducing property values stored in neighboring nodes into one “aggregate” value on the target node. This is performed over a set of nodes of the same category linked to nodes of another category via a particular relationship.

MATCH (n:Category1)-[r:REL]-(m:Category2)
WITH n, m.target_prop
SET n.new_prop = f(targeted_props)

The operator finds all nodes of m:Category2 that are connected to a node of n:Category1, then picks the target properties from each m node to apply to a function. The result is the value of the function’s output is set as a new property on node n. In GraphXR’s implementation, a set of functions are provided, though users can also customize functions as well.

Here is an example of Aggregate being used to create a new property under Team, total players, with the count function applied to player nodes, resulting in 5.

Extract

Extract is the ability to pull out certain properties into another category of nodes. This is very useful for both grouping nodes that contain the same property values while being able to shift perspective on the data.

A cypher equivalent operation is:

MATCH (n:Category) with n, n.prop as orig_prop
MERGE (m:NewCategory {new_prop:orig_prop})

An example:

Here we have 6 Player nodes, each in either team_A or team_B. Extract creates Team nodes of team_A and team_B, linked to Player nodes via the relationship: IN_TEAM.

Link

Link creates edges among pairs of nodes that match certain criterias. For example, values of certain properties are identical. This plays a key role in “link prediction” which identifies 2 nodes that will either form a link or not in the future. In its simplest form, it can be expressed as cypher:

MATCH (n:Category1 )
MATCH (m:Category2)
WHERE n.prop1=m.prop2
MERGE (n)-[:NEW_REL]-(M)

WHERE statement can be generalized as:

WHERE f(n.prop1, m.prop2)=true

An example is:

The dashed edge is the result of a Link on condition (i.e. sharing the same email address).

Link is useful for bringing different data sets together, or connecting APIs and/or SQL data to existing graph data.

Shortcut

Shortcut connects the neighbors and creates edges among them. The corresponding cypher is:

MATCH (n:Category1 )-[r:REL]->(m:Category2)
MERGE (n)-[:NEW_REL]->(m)

Shortcut is essential for simplifying a graph. In this example, Shortcut simplifies ‘John Doe sent an email which was received by Mary Jane’ to ‘John Doe emailed Mary Jane’:

Once the inferred relationship is created, we can remove the original Email nodes. Shortcut can also take properties from Email nodes, count the number of emails and set them as properties in the [:EMAILED] relationship.

Conclusion

Certain combinations of data transforms are used so often that we provide them as standalone operations in GraphXR, as we do with Merge. Merge finds matching property values on a selected property and combines them. The long form version of this would be Extract to pull out a unique property as its own category of node(s), followed by Shortcut to delete the intermediate nodes. By chaining the above five operators, complex functions can be simplified to extend the breadth of exploratory data analysis that is more quickly accessible to cross-functional teams.

This visual transform language defined here enables one to go from raw data to rapid insights. All operators allow users to treat data as objects and connections, thus making data analytics very intuitive and easy to understand. It can also be applied across multiple data sources, enabling data analysts to focus on the data, rather than data pipelines, or data clearing, reconciliation and modeling. Those are still important steps, but not necessary unless the step-by-step analytics has been operationalized. During the exploration stage, speed is key for analysts to perform data transforms that can be applied to meet their needs.

Stay connected below:

--

--

Weidong Yang
Kineviz
Editor for

Weidong is an entrepreneur, scientist, programer and artist. He founded Kineviz and Kinetech Arts.