Ad Fraud is a Real Problem. How Can We Avoid It and Maximize Search Accuracy?
When we describe the BASE system and mention that businesses can offer rewards to customers for viewing their offer, a common question we get is “How you are going to prevent users from clicking through the ads just to collect the rewards?”
Our solution to reducing ad fraud and detecting bots is a combination of best-practice techniques. These include customer verification during the registration process, limiting the number of clicks per day, limiting the amount of reward per day, and checking the average time between clicks.
These techniques are developed specifically for the unique aspects of decentralized search. In this post, we’ll share some of the more advanced approaches the team is researching that can be indirectly used for fraud detection.
A search application must decide which resources are the most relevant when given a search query. Relevance includes considerations such as what results to show out of the total pool of possible resources and what order to show these results in.
We all know PageRank as the algorithm that made Google the most popular search engine. PageRank answered the question of “relevance” by providing an efficient method for utilizing the underlying structure of the web. As the success of PageRank indicates, it turns out that the number (and the quality) of web links to a page is a useful measurement for the relative importance or “relevance” of a web resource.
So how does BitClave answer the question of relevance?
On the BitClave Active Search Ecosystem, we want to create a direct connection between businesses and customers, so customers can find the product or service they are looking for and businesses can provide the best offer to the right customer. For users, the platform should provide the best recommendations for products and services. For businesses, the platform should provide the user audiences most likely to purchase their offering.
For the ecosystem, these interactions need to be facilitated in a decentralized manner that is sustainable (i.e., provides support services without the need for middlemen and provides an alternative incentive structure to stakeholders that is an improvement over centralized digital advertising platforms). So, in addition to providing the most relevant search results for users, the search activity on BASE needs to maximize the advertising budget for business.
So, in what order should we show the search results on BitClave? And how can we avoid encouraging users from searching exclusively for rewards instead of relevant services, and deter ad fraud from bots optimized to identify the highest cost-per-click?
Showing results ordered by the amount of reward to the user is not the best answer. If we do that it would suggest that users are looking for the highest reward for viewing the ads, which encourages behavior that completely undermines the value of the BASE ecosystem to businesses.
So ordering search results by reward for viewing the ad does not help here — it would be much better if the results were ordered in a way that maximizes the chance of the customer making the purchase.
This brings us to the art of recommendation systems. Recommendation systems use big data and machine learning to make a recommendation that maximizes the chance for the desired outcome. The recommendation would be different for every user based on his personal profile and his past behavior. We are all familiar with movie ranking on Netflix, for example, where every user receives personalized recommendations.
One of the common approaches in recommendation systems is Collaborative Filtering, which means on a high level that if we want to recommend something to a user we need to find similar users and see what those users liked. While PageRank answers the question of “relevance” by asking websites which other websites they think are the most relevant, Collaborative Filtering answers the question of “relevance” by asking users with similar preferences to rank content.
Let’s apply this to BASE by considering the marketplace for blockchain investors and innovators. Consider a market segment in BASE where ICO investors search for ICO projects to invest in and vice versa. This marketplace includes ICO projects from different domains like decentralized exchanges, gaming, search, IoT, insurance, asset management, and more.
For every investor, the activity ledger maintains anonymized information including their area of interest based on search history and their preferences based on activity, like whether the investor liked or previously invested in an ICO project. When a new investor performs a search, we rank the ICO projects in a similar way to investors with a profile that matches the current investor.
Let’s take a simple example
Let’s say Bob is a new investor searching for an ICO project, and ICO D and ICO E met the search criteria. We see that Alice and Bob are “similar” investors as they have the same opinion about ICOs A,B,C, and D, so we assume they have similar opinions about ICOs E and F. So we will show ICO F in search results above ICO E, because we will assume that Bob will also like ICO F (at least more than ICO E, similarly to Alice).
More formally, the similarity metric we use is called “cosine similarity” and is defined as follows:
where A and B are the sentiment (from -1 for complete dislike to 1 for complete like) of Alice and Bob for ICO projects. If the similarity is 1 then Alice and Bob like the same ICO projects, and if it is -1 then they like the opposite projects.
This is a simplified example to illustrate the concept of Collaborative Filtering and cosine similarity.
High quality recommendation systems must be trained on large datasets. In BASE, all important information about searches, offers, views, and buy activities is stored on the anonymized activity ledger and is available to anyone. BitClave uses this data to train the BASE recommendation engine and also uses it when ordering search results. With time, the ranking and recommendations will be more and more accurate.
Let’s go back to our original topic of ad fraud detection. If we see that a certain customer is ordering the offers by reward amount and is clicking “too fast”, the system would consider such customer as more likely to be a “scammer” than a customer that is ordering search results by relevance.
Taking this one step further, we could even place “honeypot” offers in the search results to more clearly detect scammers. “Honeypot” is a technique used to attract and detect hackers trying to break into unauthorized network environments. This is just one of many examples of potential methods to detect scammers.
Obviously, this will be a continuous effort to fight scammers. It will take time — businesses will be more cautious initially and will offer smaller rewards to customers. However, with time they will get more confident in the system and customer data and will offer more customized offers of higher value.