How to Perform Fraud Detection with Personalized Page Rank

Along with python package Network

Antoine Moreau
Sicara's blog
3 min readJan 9, 2019

--

Read the full article on Sicara’s blog here.

This article shows how to perform fraud detection with Graph Analysis. Thanks to Personalized Page Rank algorithm and Networkx python package.

Fraud detection is a major field of interest for data science. As fraud is a rare event, the main challenge is to find a way to bring to light abnormal behavior. That is why graph analysis is a useful approach to perform fraud detection. Many algorithms exist to extract information from graphs. In this article, we will study one of them: the Personalized Page Rank algorithm. To manipulate our graphs and compute this algorithm we will use the python package Networkx.

Page Rank Algorithm

Page Rank is a well-known algorithm developed by Larry Page and Sergey Brin in 1996. They were studying at Standford University and it was part of a research project to develop a new kind of search engine. They then successfully founded Google Inc.

This algorithm assigns a numerical weighting to every node of a connected network. This measure represents the relative importance of a node within the graph (its rank).

To compute Page Rank a random walk is performed. This random walk is defined as follow :

  • The walker starts at a random node in the graph.
  • At each iteration, the walker follows an outgoing edge to one of the next nodes with a probability α or jumps to another random node with a probability 1-α.

The PageRank theory holds that the imaginary walker who is randomly walking on links will eventually stop. The probability, at any step, that the person will continue is called the damping factor α.

As an example, for Google, the network is composed of websites that point to each other through links. The page rank measure of each web page is then the probability that a person randomly surfing on the internet would finally arrive on this specific page.

In mathematical terms, The Page Rank of a node is the stationary measure of the Markov Chain described by the random walk.

Page Rank measure of a node in the graph

On the animation below you can visualize a Random Walk performed on a connected graph with a damping factor set to 0.85.

Random walk on a graph — (α = 0.85)

On the above example, one would predict that the node ‘c’ is the one with the higher rank. This is the most central node. On the contrary, the node ‘h’ is more likely to have a low rank.

Page Rank measure for each node of the graph

How to compute Page Rank with Python and Networkx

The python package Networkx gives the possibility to perform graph analysis. A lot of algorithms are implemented in this package (community detection, clustering…), pagerank is one of them.

With the python script below, thanks to Networkx, we will first generate a random graph and then apply pagerank function.

Personalized Page Rank Algorithm

We have seen that the Page Rank is a representation of the importance of nodes within a network. Personalized Page Rank gives the possibility to bring out nodes in a graph that are central from the perspective of a set of specific nodes.

Read the full article on Sicara’s blog here.

--

--