Augment Intelligence with Graph Power
In my two decades of working in the Data Warehousing, Business Intelligence and Analytics field, nothing has bothered me more than being tied to a relational data model.
While this abstraction has its strengths in filtering and aggregating or rolling up large amounts of row/record shaped data, the major issue I have with it is that it hides the context and connectivity of the data points across records or tables.
Of course, you always have the option to
- drill through your data back and forth,
- filter here and there,
- make a comparison analysis,
- calculate sophisticated KPIs and so on.
But you’d still end up with a limited view (tabular) of your information.
As an example, let’s compare a typical BI dashboard:
with a Graph representation of the same data:
With this Graph, we can see and investigate the relations between the entities in our fraud use case: persons, shared credit cards, phone numbers etc.
We see not just records but entities in their context.
In fact, we can do more — we can also use the power of the Neo4j graph engine and the Neo4j graph algorithms to query the graph and to calculate KPIs that help us to understand the fraud and find suspicious elements or actors.
Graph algorithms capture the topological information in one or more scalar metrics per node or relationship. They can be seen as an unsupervised learning method which labels our data to belong into certain groups.
- Page Rank:
measures the transitive influence or connectivity of nodes
- Betweenness Centrality:
amount of influence a node has over the flow of information in a graph, how it connects different clusters
- Triangle Counting / Clustering Coefficient:
number of triangles each node in the graph participates in
- Community Detection:
sets of connected nodes
In the following example we, compute pageRank on a derived graph (person to person) based on shared identities.
The next step is to integrate the results into our BI-dashboard, to have all results under at our fingertips in one consolidated view.
Qlik has released the server-side extension (SSE) protocol for Qlik Sense and QlikView in 2017. It allows us to connect and integrate the Qlik associative engine (QIX) with any other third-party system, such as a calculation engine and/or database. There are already some SSE implementations to integrate Qlik with R or Python to use it for advanced analytics tasks like forecasting or predictive analytics and machine learning.
- Qlik data load process (LOAD Script)
- any Qlik UI expression (measures in charts, tables; expression for labels, etc.)
- TIQ Solutions Neo4j Datablending Extensions for Qlik Sense
The communication and data flow between Qlik’s BI web frontend and the SSE backend is handled by Qlik’s engine API on one side and the gRPC protocol on the other side. The SSE will process all function calls and integrates with Neo4j via Bolt protocol.
The SSE protocol provides a couple of functions you can implement besides your own defined functions in your SSE code. In case of our Neo4j SSE we implemented these functions:
Let’s consider an SSE call we can add in a Qlik UI expression, e.g., in a table column or chart as a measure. The query result should give us the Page Rank by Person for the most suspicious fraudsters (see the Where Clause in Cypher statement):
The function call gets a Cypher statement as parameter (green) which will be processed in Neo4j. The QIX engine injects the dimensional data from the field in Qlik (blue).
The data is then sent to the SSE in groups of rows (bundeled rows). In the case of our data set with the amount of around 100k persons, the QIX engine will split it into 3 or more calls.
The SSE will inject the list of persons from each group into the Cypher
WHERE clause place holder
IN [#] (yellow) and executes the Cypher statement against Neo4j.
The resulting measure
max(p.pagerank) will then be matched to the incoming rows by person’s name. What’s important to mention here that the Cypher result set needs to include the dimension p.full_name, which is needed to map the results to the existing data.
This also works the same way for a multi dimensional table or chart. In that case, the sequence of dimensions needs to be the same for the function call parameters and Cypher result set and in the
WHERE clause placeholders the dimensional position needs to be added:
WHERE … x IN [#1] AND y IN [#2]
Finally, we can use a Cypher query to select the fraudsters from the whole dataset:
And then we can integrate the KPIs from the Graph algorithms into the Dashboard (see table and scatter plot) that gives us more insights:
This integration shows how we can add the Graph power to an existing Qlik Sense Dashboard. I think there are many more use cases where we could break up the limitation of relational BI by adding a layer of context via graph data and injecting more complex calculations to leverage hidden insights.