Graph Network Analysis

Chituyi
5 min readOct 10, 2023

--

Big data has all kind of relationships whether implied or explicit, how do you expose them?

Entity — relationship analysis

Machine learning algorithms will generally ask you to show them a fraudulent transaction and its characteristics so that it can classify future frauds based on what it has learnt from what you showed it. So, what happens when you do not have enough examples to train the model on how fraud looks like? In fact, the dataset that you’ll use to train the model will be imbalanced and you’ll have to add a penalty term to the model on how it classifies the category with lower observations or up sample or down sample. While these methods can work, using graph analytics will reveal tight communities that may be fraud rings or money launderers.

Importantly, you can zoom in and out on the community the fraudulent transaction belongs to and start your analysis there! Hence graph analytics best compliments predictive analytics for prescriptive analytics.

Network graphs are a great way to understand complex data. They can show you how different things are connected, even if those connections are indirect or difficult to see. Graph analytics provides a powerful and flexible tool for analyzing and visualizing complex interconnected data, leading to deeper insights and more accurate predictions.

Well, 🤔I still feel Community structure Detection remains one of the best reasons to consider graph analytics.

Look at how you can quickly spot communities and MVPs in your data to back up a business strategy or initiate a new product line by unveiling unseen data relationships. 😊

You can play with the tool by clicking on this link on Streamlit or directly here by Maxing the screen!🔛

Code for the project.

Some history…

The term “graph database” was first coined in 2007 by Michael Hunger and Mark Webber, the founders of Neo4j. Graph database provides the underlying mechanisms for storing nodes and relationships that can be leveraged by a knowledge graph visualized as graph structure.

What is Graph database?

A graph database is a type of NoSQL database that uses graph structures for semantic queries with nodes, edges, and properties. Graph databases are designed to store and query highly interconnected data. Unlike traditional relational databases, graph databases do not require a predefined schema, which makes them more flexible and scalable.

Graph databases can be used to solve a variety of problems, including:

Social network analysis. Graph databases can be used to analyze social networks, such as Twitter and Facebook, to identify influencers, communities, and trends.

Fraud detection. Graph databases can be used to detect fraudulent transactions by identifying patterns in financial data.

Recommendation systems. Graph databases can be used to build recommendation systems that recommend products, services, or content to users based on their past behavior and interests.

Knowledge graphs. Graph databases can be used to build knowledge graphs, which are networks of entities and relationships that can be used to answer questions and perform reasoning tasks.

When to use Neo4j and Pyviz or Dash Cytoscape:

Neo4j is a fully supported, open-source, enterprise-ready graph database. It is one of the most popular graph databases in the world (The Neo4j Community Edition is licensed under the GNU General Public License (GPL) v3.This means that you can use, modify, and distribute Neo4j Community Edition for free, including in commercial applications.).Pyviz is a Python library for graph visualization. It can be used to create interactive and static graph visualizations. Dash Cytoscape can be used to create dynamic and interactive graph visualizations that can be embedded in Dash applications. Dash Cytoscape is a good choice for creating graph visualizations in your Dash App.

Neo4j is a good choice for applications that require high performance and scalability. Pyviz is a good choice for applications that require interactive and customizable visualizations usually used with Networkx for analysis.

Advantages of graph visualizations:

Clarity. Graph visualizations can be used to represent complex relationships concisely.

Interactivity. Graph visualizations can be made interactive, which allows users to explore the data and identify patterns and relationships.

Scalability. Graph visualizations can be used to visualize large and complex datasets.

Disadvantages of graph Visualizations:

Requires Expertise. To create effective visualizations and to interpret them correctly, a certain level of expertise is required. Without this expertise, there’s a risk of misinterpreting the data.

Overcomplication. Graphs can become cluttered and hard to understand if they try to represent too much data at once. This can lead to confusion and misinterpretation.

Graph networks can be used for business advantage in a variety of ways, including:

Customer segmentation. Graph networks can be used to segment customers based on their behavior and interests. This information can then be used to target customers with relevant products and services.

Product recommendation. Graph networks can be used to build product recommendation systems that recommend products to customers based on their past purchases and interests.

Fraud detection. Graph networks can be used to detect fraudulent transactions by identifying patterns in financial data.

Supply chain management. Graph databases are used to manage supply chains by tracking the movement of goods and materials through a network of suppliers and manufacturers.

Risk assessment. Graph networks can be used to assess risk by identifying interconnected entities and relationships.

Network visualizations for ML feature engineering and selection:

Node Importance. In a graph, some nodes might be more important than others based on their connections. Something I tried with success is Node ranking based on connections and elimination. (Let me explain, once you visualize your graph data and identify the features with most connections list them and eliminate them then visualize again. Repeat the process until there are a few relationships remaining. These is a judgement call based on experience and domain knowledge and understanding. You’ll want to involve several mates from Operations)

You can engineer a new feature by taking the adjacency matrix of the nodes and finding the length of the list which can signify the strength of connection or relevance of the feature in that data set.

Community Detection. Nodes belonging to the same community or cluster might share similar characteristics. Community detection algorithms can be used to create new categorical features indicating the community each node belongs to.

Identify new features.In fraud detection, graph analytics can be used to identify patterns of fraudulent transactions. These patterns can then be used to create new features that can be used to train a machine learning model to detect fraud.

In conclusion, graph databases provide a powerful tool for managing and analyzing complex interconnected data. They offer significant advantages over traditional relational databases when dealing with highly connected data. The use of network visualizations for feature engineering and selection can lead to more robust machine learning models. Furthermore, businesses can leverage graph networks to gain a competitive advantage by better understanding their data relationships.

Key take aways!

Graph analysis is good at identifying patterns and relationships in data that would be difficult to see using other types of visualizations.

Graph analysis can be used to communicate complex data in a way that is easy to understand (You can draw insights by visual inspection).

Graph analysis enable detection of communities and Most Valuable Nodes (it can be salesperson or markets or products etc.

Check out free ML projects with Code to get started here!

https://dallo7.github.io/

#MLDemocratizer!

--

--

Chituyi

Building data Pipelines for ML and AI to aid Supply Chain Agility and improve Customer Intimacy. https://dallo7.github.io/