Graphs
First, If you need to represent any piece of data, maybe the first thing you can think is a table format in terms of rows and columns, but, really exists many other ways to do something similar, for example, you can see graphs as a different way to represent the same piece of data but now in terms of nodes or vertex and edges.
In this way, bot cases could be fine to work with data analysis, but, they key point to define the best choice is think about what kind of question we need to solve, because, if we are looking found patterns, do this with graphs could be easy and computational efficient way in contrast with try the same with tables.
Example:
In financial services, graphs are plenty used to build fraud prevention applications because you can easily detect patterns as circular payment chains from transactions between accounts and others ways to try answer the same kind of questions are hardest.
In my personal experience, I saw some natural graph questions “solved” with SQL paradigm using PL-SQL and “works”, but, incurs in cyclic queries who are complex to write and maintain for developers and are not performant.
Graph Analysis at Glance
So, to see the power of graph, we can compare it with the “relational model” to try answer the follow question:
Is there money flow between Bob and Charlie?
Even with this very small piece of data is hard answer that, because:
- First, relate the CustomerID c2(Bob) from CUSTOMER table with their AccountID a1 from ACCOUNT table.
- Second, do the same thing with c3(Charlie) and their account a4.
- Third, track those transactions from TRANSACTION table, and if you search with a bird of eye, you can advice they does not have direct transactions between, right?.
But, if We represent the same piece of data in a graph format, We can advice in fact exist an indirect flow cash between Bob and Charlie.
If you follow the lines(edges) directions to get the relationships between persons(nodes) and their accounts(nodes), now is clear:
- Bob transfer $20.000 to Dave.
- Dave transfer the same amount of money to one Alice’s account.
- Alice gave $30.000 to Charlie who returns $10.000 to Alice.
We can take two insights here:
- The first one is, yes, exist money flow between Bob and Charlie.
- The second one, exist a kind of Circular Payment Chains Pattern between Charlie and Alice.
Detect those patterns is the key to build Fraud Prevention applications and exist many variations of those in a very big scale, so, with graphs we can approach solutions in a better way than a relational model for this kind of questions.
Graph Data Structure
Data structure is about how to save data in a computational an efficient way.
The graph data structure is a collection of points (vertices or nodes) and lines (edges) who relate those points, their labels represents “the kind of” and their properties represents “features”.
In this graph used to model a very simple Bank Transaction System:
Account is a kind of node with number and type properties and Transaction is a kind of edge with date and amount properties with a well defined direction.
In both cases their labels names naturally match with the kind of their abstract representations, It is, Accounts and Transactions respectively.