Augment Intelligence with Graph Power

In my two decades of working in the Data Warehousing, Business Intelligence and Analytics field, nothing has bothered me more than being tied to a relational data model.

Ralf Becher
Apr 12, 2019 · 5 min read

While this abstraction has its strengths in filtering and aggregating or rolling up large amounts of row/record shaped data, the major issue I have with it is that it hides the context and of the data points across records or tables.

Of course, you always have the option to

  • drill through your data back and forth,
  • filter here and there,
  • make a comparison analysis,
  • calculate sophisticated KPIs and so on.

But you’d still end up with a limited view (tabular) of your information.

As an example, let’s compare a typical BI dashboard:

Image for post
Image for post
A Fraud Analysis BI Dashboard in Qlik Sense

with a Graph representation of the same data:

Image for post
Image for post
Same Fraud Data as in Dashboard modeled as a Graph in Neo4j, showing one component of the whole Graph

With this Graph, we can see and investigate the relations between the entities in our fraud use case: persons, shared credit cards, phone numbers etc.
We see not just records but entities in their context.

In fact, we can do more — we can also use the power of the Neo4j graph engine and the Neo4j graph algorithms to query the graph and to calculate KPIs that help us to understand the fraud and find suspicious elements or actors.

Graph algorithms capture the topological information in one or more scalar metrics per node or relationship. They can be seen as an unsupervised learning method which labels our data to belong into certain groups.

Let’s use the following (free ) to investigate the fraudulent behavior:

  • Page Rank:
    measures the transitive influence or connectivity of nodes
  • Betweenness Centrality:
    amount of influence a node has over the flow of information in a graph, how it connects different clusters
  • Triangle Counting / Clustering Coefficient:
    number of triangles each node in the graph participates in
  • Community Detection:
    sets of connected nodes

In the following example we, compute pageRank on a derived graph (person to person) based on shared identities.

Image for post
Image for post
Example Cypher Call in Neo4j to calculate and store the Page Rank between Persons in the Fraud Graph
Image for post
Image for post
Graph result where I marked Persons with a high Page Rank, perhaps these acting as agents in a Fraud Ring

The next step is to integrate the results into our BI-dashboard, to have all results under at our fingertips in one consolidated view.

Qlik has released the server-side extension () protocol for Qlik Sense and QlikView in 2017. It allows us to connect and integrate the Qlik associative engine (QIX) with any other third-party system, such as a calculation engine and/or database. There are already some SSE implementations to integrate Qlik with or to use it for advanced analytics tasks like forecasting or predictive analytics and machine learning.

Recently, my company has implemented and released a Neo4j SSE. It is used to call and live-query the Graph Database from Qlik in these scenarios:

  • Qlik data load process (LOAD Script)
  • any Qlik UI expression (measures in charts, tables; expression for labels, etc.)
  • TIQ Solutions Neo4j Datablending Extensions for Qlik Sense
Image for post
Image for post
Neo4j Datablending and Live-Querying with Qlik Sense

The communication and data flow between Qlik’s BI web frontend and the SSE backend is handled by Qlik’s engine API on one side and the gRPC protocol on the other side. The SSE will process all function calls and integrates with Neo4j via .

The SSE protocol provides a couple of functions you can implement besides your own defined functions in your SSE code. In case of our Neo4j SSE we implemented these functions:

Image for post
Image for post
TIQ Neo4j SSE — implemented functions

Let’s consider an SSE call we can add in a Qlik UI expression, e.g., in a table column or chart as a measure. The query result should give us the Page Rank by Person for the most suspicious fraudsters (see the Where Clause in Cypher statement):

Image for post
Image for post
TIQ Neo4j SSE — example call in a Qlik expression

The function call gets a Cypher statement as parameter (green) which will be processed in Neo4j. The QIX engine injects the dimensional data from the field in Qlik (blue).

The data is then sent to the SSE in groups of rows (bundeled rows). In the case of our data set with the amount of around 100k persons, the QIX engine will split it into 3 or more calls.

The SSE will inject the list of persons from each group into the Cypher WHERE clause place holder IN [#] (yellow) and executes the Cypher statement against Neo4j.

The resulting measure max(p.pagerank) will then be matched to the incoming rows by person’s name. What’s important to mention here that the Cypher result set needs to include the dimension p.full_name, which is needed to map the results to the existing data.

This also works the same way for a multi dimensional table or chart. In that case, the sequence of dimensions needs to be the same for the function call parameters and Cypher result set and in the WHERE clause placeholders the dimensional position needs to be added:
WHERE … x IN [#1] AND y IN [#2]

Finally, we can use a Cypher query to select the fraudsters from the whole dataset:

Image for post
Image for post
Use a Cypher Query via SSE to select Fraudsters by Graph KPIs in Qlik Sense

And then we can integrate the KPIs from the Graph algorithms into the Dashboard (see table and scatter plot) that gives us more insights:

Image for post
Image for post
Screencast: Integrated Graph KPIs in Qlik Sense Dashboard Objects, TIQ Network Extension on the right side

This integration shows how we can add the Graph power to an existing Qlik Sense Dashboard. I think there are many more use cases where we could break up the limitation of relational BI by adding a layer of context via graph data and injecting more complex calculations to leverage hidden insights.

If you want to learn more, see the links below. If you are interested to try this out, please ping me via or

References

Image for post
Image for post
Free O’Reilly eBook on Graph Algorithms

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher…

Thanks to Michael Hunger

Ralf Becher

Written by

Datapunk and Qlik Luminary, Head of R&D @Vizlib, Certified Neo4j Professional, Tableau Hacker, Analytics and Big Data, Clickhouse, Snowflake, Node.js, k8s

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.

Ralf Becher

Written by

Datapunk and Qlik Luminary, Head of R&D @Vizlib, Certified Neo4j Professional, Tableau Hacker, Analytics and Big Data, Clickhouse, Snowflake, Node.js, k8s

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store