Detection system for identifying abuse and fraud using artificial intelligence across peer-to-peer distributed content or payment networks
Adbank has developed and submitted an updated patent for a an AI-powered fraud detection system with plans to implement it on the adbank advertising network.
Traditional fraud detection systems are constructed to run within the specific advertising platform they were designed for. With the introduction of blockchain technology into the advertising ecosystem, policing fraud and abuse becomes a big opportunity.
“Modern online advertising networks are built upon an assumption that the
platform will police the network and honestly report pricing to participants in the network. The underlying trust problem between advertisers and publishers can be resolved using a public ledger employing blockchain technology, recording the transfer of value for all to see.” — patent submission
Knowing that ad fraud happens and is an issue that needs to be addressed, adbank has created a new system and method for detecting and mitigating abuse and fraud on advertising platforms using artificial intelligence. Through accessing advertising databases, user logs, ad images, rendering of web pages, blockchain transactions, and using supervised and unsupervised machine learning algorithms, adbank’s anti-fraud system can detect many types of fraud.
In the adbank anti-fraud system, the neural network mechanism periodically scans the blockchain data structure and conducts automated (or semi-automated) reviews of transactions and transaction behaviour. Transactions (or a randomly selected subset thereof) may be automatically reviewed for suspicious rendering behaviour (e.g., based on automated image viewport / webpage code analysis), suspicious loading behaviour (e.g., repeated loading potentially indicative of bot loading or bot click-throughs), and/or suspicious auction bidding behaviour (e.g., artificially bidding up prices, price collusion to reduce a price).
Using artificial intelligence to detect and flag common patterns of abuse and fraud resolves the problem of having to trust the advertising platform operator with little or no transparency. Adbank’s improved technical system uses a combination of machine learning classification and a decentralized blockchain ledger system.
Example: Detection of ad injection (one of many)
Suppose a malicious actor injects advertisements into a legitimate publisher’s website, via methods such as cross-site scripting or SQL injection, in order to fraudulently benefit from the legitimate publisher’s traffic.
The technical details:
In figure 2, image features are extracted using a convolutional neural network (CNN), which are further processed using a dense neural network (DNN), the output of which in this embodiment is a feature vector of length 300.
In parallel, the code of the website is split into a sequence of discrete tokens using a known HTML lexer. These tokens are converted into numeric vectors of length 300 using a custom embedding. This embedding is learned using the same techniques as the popular “Word2Vec” embedding, using a corpus of HTML documents including both known fraudulent and known legitimate websites. The output is processed using the recurrent neural network (RNN), the output of which is a feature vector of length 300.
The RNN can be modified such that the set of input hybrid website and image features is provided to the centralized fraud detection neural network in the form of a data structure configured to have a number of time series, a number of values per time step, and a number of time steps, and wherein the number of time series, the number of values per time step, and the number of time steps are tuneable to modify characteristics of operation of the centralized fraud detection neural network.
The final feature vector used for classification is obtained by concatenating the image feature vector with the token feature vector as shown, creating a vector of length 300 + 300 = 600. This vector is then used as input for a DNN, consisting of 3 rectified linear unit (ReLU) layers of width 600, followed by a single sigmoid layer of width 2. The output is a single value between zero and one, which represents the confidence that there is an advertisement injection present in this example. The amount of concatenation can be modified such that a number of concatenations are tuned to maintain a target confidence level.
If the computed probability of the presence of advertisement injection is above a threshold of 0.99, the sample is identified as representing malicious activity. The corresponding publisher profile can be added to a data structure storing a list of publisher profiles applied for exclusive filtering.
In order to perform this task, the network is trained in HTML document samples that are labelled by experts as containing or not containing injected advertisements. Samples containing injected advertisements are labelled as 1, and others are labelled as 0.
The error, or loss, of the network on this task is taken to be the squared difference between the prediction and the label. During training, the network’s parameters are adjusted to minimize this loss using a known optimization algorithm, which is chosen to be Adaptive Moment Estimation (Adam), in a process which is known as backpropagation.
Training is performed until the mean loss over all samples in an unseen test dataset reduces below 0.01, and is considered successful if the classification error — measured as the percentage of falsely-identified records — is below 0.01.
What this means for the adbank advertising ecosystem:
Several technical problems are overcome with the use of adbank’s fraud-detecting system, including deriving computational approaches and mechanisms to improve the accuracy of fraud detection while maintaining an efficient use of computer resources. This increases trust in the adbank advertising network through periodic audits that automatically flag advertisements it defines as suspicious.