AppLike Fraud Paper 2018 — 8 types of mobile advertising fraud and how to fight them

Advertising fraud is expected to cause $8 billion of damage (5–10% of all ad spend) to the industry in 2018. Up to 50% of total yearly ad spend is expected to go to Mobile.

AppLike is a direct user acquisition platform, i.e. we run our own apps worldwide and sell CPI traffic from those apps. We acquire millions of users for our apps every month. Of these, we would mark as much as 15–20% as fraudulent activity. And because we work with all the major UA networks and players in the market, I suspect that the market number is even higher, especially when you consider that detection is sometimes difficult!

It is important for AppLike to detect and eliminate fraud in order to fulfill the needs of our clients — mobile advertisers — and to provide clean and sustainable mobile app downloads to those clients.

Alongside various pieces on the subject of fraud, published by big ad-tech firms like Singular, Appsflyer and Adjust, this report seeks to share unique details about our anti-fraud activities and provide a different perspective on this challenging topic.

As I see it, one key issue is that fraud prevention is generally considered only from a user acquisition viewpoint. For example it is common for mobile advertisers to analyze ad delivery, network sources and how users are interacting with the advertisements. But there is no focus on understanding what the potential fraudulent user actually does on his device. At AppLike, we believe that combining both aspects significantly improves fraud detection.

In the Android mobile app economy, fraudsters take two major approaches:

  • Ad fraud (for example: click spamming, bot farms or multiple ad overlays, geo source fakes, faked app installs on no real devices)
  • In-app or device manipulation fraud (for example: faked in-app purchases, no real devices [bot farms], app manipulation to access special features or to level up game accounts, fake affiliate programs and many others)

Most of these methods are somewhat interlinked, so can hardly be considered independently.

Where does AppLike have touchpoints with fraudulent behavior?

To understand why AppLike needs to deal with all the different types of fraud mentioned, first I would like to explain our business model. AppLike is an Android app recommendation engine which helps app publishers to target new app downloads. We run several different apps worldwide and do user acquisition through all major UA networks on marketing side. Users download one of our apps, sign up with their age and gender, and share some interesting app usage data. Based on that data our system recommends new apps to users. After downloading and actively using those apps, users are able to collect a virtual currency which can be redeemed as Amazon coupons, PayPal money or various kinds of voucher.

AppLike partners with hundreds of app publishers worldwide and provides new app downloads to them in more than 25 countries.

Where we encounter fraudulent user behavior:

  1. On the marketing side: We have seen all kinds of attempts to deliver fake downloads for one of our apps, or even steal a “real install” by wrongly attributing an install from another source to their own.
  2. Manipulation of our app with code injection and reverse engineering to earn our virtual currency more easily.
  3. Running our apps on virtual devices, e.g. emulators, rebuilding our app with scripts to connect to the backend API, or starting several apps parallel on the same device.

All of these result in bad user quality for advertisers on our platform, so it’s key that we beat them.

Let’s look in detail at the most commonly occurring types of fraud attempt on the AppLike platform and how we counter them (Be aware: There is not one easy solution that deals with all the issues — which is just why I want to go a little bit more into detail here):

  1. Click spamming or click injection

When click spamming happens, bad “publishers” steal app install attributions from other publishers and get paid for these generated installs. How does this happen? Several approaches are possible: First, fraudulent “publishers” sign up to various ad networks. Then, from those ad networks, they pull the tracking links for app download campaigns which are provided for app advertisers to promote app install offers. These would usually be linked to multiple advertisements on their sites to allow users to click the ad, executing the tracking link and redirecting the user to the Play Store (as we only run an Android solution, the Play Store is always the relevant destination); after the app is downloaded from the Play Store, the installation is attributed. The fraudulent “publisher” can simulate ad clicks by having spyware apps installed on many devices. Through these spyware apps, the scammers execute enormous numbers of click requests in the background (click spamming) to “fish” some attributions. So if the mobile device installs an app normally, via a different ad, the installation could be attributed to the fraudulent “publisher” because of last-click attribution.

This attack is fairly easy to detect because these types of sub publishers usually generate many more clicks on average than others. For example, we could detect that a single “publisher” has generated more than 400 clicks a day on one device for one campaign.

A more advanced method for fraud is to have a malicious Android app installed on several devices. This app is able to “listen” to all new app installations via Android system broadcast, which can send the information almost in real time. So imagine that a user clicks on a regular ad: the click is saved by the tracking provider’s system, and following download and installation, attribution takes place after the app has been opened for the first time. The problem is the time delay between completed installation and first opening of the app. This delay allows the malicious app to receive the new installation event even before the app is first opened, so another click can be generated from the fraudulent app which received the broadcast. Ultimately this new installation would be attributed to the scammer.

I would like to explain two of the major approaches which we use at AppLike to foil attempts at this type of fraud. The first is a basic machine learning model which is trained on three data feature sets: Time difference between click and measured attribution, the package size of our app and the network connection type. The basic output of this model can now mark outliers with a significantly “too short” click-to-attribution time, giving us a good indicator for fraud.

Secondly, Google released a new install referrer API this year, which provides the timestamp of the real click source. So our tracking system (built in-house) can compare this dataset with the timestamp of the attributed click, identify fraudulent users, then deal with them appropriately.

2. Bot farms

I would guess that everybody now has a picture like this in mind:

Essentially, a bot farm is when scammers put many devices into one room and simulate new app installs or in-app behavior. However, bot farm methods have become a great deal more sophisticated. From what we have seen, I would assume that fraud businesses out there use virtual cloud servers. We recognize that most bot farm traffic comes from emulators or low-priced Chinese Android phones, and our emulator detection (subsection 4) helps to detect this kind of fraud. From a scammer’s perspective, a key challenge in taking this approach is that the scammer needs a great many IP addresses from the right country to simulate such a huge number of devices. This is not always so easy, meaning that we are able to detect many attempts simply by way of the IP address used. If an official VPN provider comes into place, then it is helpful to consider an IP lookup table, like the reliable one provided by Maxmind Inc.

We encountered one example where implementing all of our IP validation rules did not lead to success. Device fraud analytics detected an emulator, but all IP sources belonged to carriers from the US. To investigate further, one of our brilliant student placements undertook penetration testing on the logged IP addresses, using KALI Linux along with some other tools. It turns out that the IP addresses were affected by a common router vulnerability at that time, meaning that the router was controlled by hackers and being used as a proxy server.

3. Fake geocode source

Most of the major mobile user acquisition networks offer geo targeting for user acquisition. This is usually quite effective, though sometimes the IP address of the generated ad click has a different geocode than the IP address which is identified during first boot-up of the app after installation. Statistically, this should not happen very often without fraudulent behavior taking place: After all, how fast can someone travel between countries in person? Most of the attribution providers help to mark those users, but sometimes you must pay for those conversions as well. Our tracking system, built in-house, includes various mechanisms to prevent this type of fraud: First, we consult IP address geo lookup databases at several stages in the user’s funnel. Later some further conditions come into place: How many times did the country change happen; what is the time difference between click, install and those funnel events; and how great is the distance between the different locations. The second of these involves a more statistical approach, where we train a machine learning model with the datasets for all the different channels and marketing networks, to mark outliers in real time.

4. Special device meta information

Every Android device manufacturer hardcodes certain information about the smartphone into the Android operating system, and every app can read this data via Android API. In general, this information can be used as a decent indicator for fraudulent activity. Just keep in mind fraudsters tend to manipulate that information, though many fraudsters are simply too lazy to do so in a comprehensive way.

Some device or model names are virtually guaranteed to be fraudulent. For example, when you see Redmi Note 3, AndyWin, VirtualBox or “To Be Filled By O.E.M.” you can be certain that you have found a fraudulent user.

If you look at all the devices from a single marketing source and can observe many identical devices, this is easily identifiable as fraud. We have noticed that this sometimes happens for specific sub publishers in one marketing dource. Of course, the task is not manually manageable if you generate huge numbers of new app installs every day. So our in-house system using various rule sets comes into play:

  1. Validate if all device information fits together correctly according to the current information in our database. For example, some fraudulent users change the device manufacturer but not the display resolution, which we collect as in-app information from the Android APIs.
  2. Check if device information data sets have previously displayed unusual behavior.
  3. Avoid false positives by analyzing device manipulation attempts on the client side (for more details see: subsection 5)
  4. Check the statistical frequency at which the device occurs in other channels. We have trained a small machine learning model which is updated every week. Furthermore, it is important to take totally new device information into account — though even when manufacturers bring new phones to market, they won’t occur in only one channel.

5. No real device / no real user, or device manipulation

Detecting whether app start-up takes place on a real smartphone can be a tough challenge. We have learned many ways to approach this over the years, and here is a short summary of some of them:

  • We use very basic conditional methods. For example, analyzing the device model names and display resolution and verifying them against a comprehensive device and profile database (subsection 4). Outliers can be easily identified as fraudulent. However, this solution is very vulnerable.
  • We considered the methods used for manipulating device meta information: Most attempts are made via root access on operating system level. The challenge is that not all rooted devices are actually fraudulent. So we looked further, checking at Linux file system level to see which files have been changed and in what way. As well as finding the manipulations at OS level, it’s important that your own implementation is not attackable; ensure it is protected against reverse engineering, for example. This is much more complicated than it should be, because Android does not offer any hardware security modules. If you are also interested in building such a system, we recommend that you look into encryption algorithms and the Android native development kit (NDK).
  • Here’s another more sophisticated approach: We know that scammers use the Xposed framework to inject code into the Java execution stack and thereby change the return values of system functions. We developed a stable way of detecting those attempts and avoiding manipulation. The biggest challenge for developers is that reverse engineering is easy on Android. Xposed is easy to detect in Java code, though the relevant code snippet can also easily be disabled by Xposed, or by changing the Dalvik bytecode. So once again we moved some parts from the core logic to native C code, to make reverse engineering much harder.
  • Google came up with their own solution to protect apps against security threats: SafetyNet is a great additional option to improve device manipulation detection. The service is integrated into Google Play Services so that it has more permissions to access various interfaces from the Linux OS than regular apps. The system appeared to be stable until Magisk, the Xposed enhancement, was released. The unavoidable conclusion, as always: No method is 100% safe and most security protection features need some further adjustment. Perhaps we will publish a new post going into more detail about SafetyNet and Magisk.

6. App manipulation, code injection or app simulation scripts

Like most other developers, I consider a generally open system like Android to be beneficial for the tech ecosystem; however, it also has some disadvantages. It is not that difficult to use reverse engineering to manipulate Java apps. Some years ago, we detected one user on our platform who was repeatedly sending the same app usage timestamp for different apps to get more rewards. This was not caused by a software bug. So we tried to figure out how it could happen. Several tools on the market make it easy to convert the Android package kit (apk) to a jar file, which includes java class files. So ultimately, programming code is viewable and can be manipulated or even injected. To prevent this, Android offers a great tool called ProGuard which obfuscates the source code. There are no perfectly secure solutions on the market but this tool makes manipulation much harder. Another way to simulate app installs or even real user behavior is to analyze network traffic (sniffing) and find out how communication with the backend works. If the fraudulent user sees the network HTTP requests, they can easily write a script to behave exactly like the original app. Every app developer should use SSL encrypted server connections at the minimum. But even SSL connections can be monitored using certain tools on the market (for example CharlesProxy).

So we decided to take the next step, implementing certificate pinning to prevent those tools from working.

7. Dump real app installation, e.g. copy user profiles to emulators

Alongside attempts at fraud on the marketing side, we also face scammers trying to cheat our coin reward system and run our apps on emulators, to allow them to run multiple app instances in parallel. In the past, we only recognized emulators on first app boot up, so fraudulent users were dumping “officially” generated installs from rooted mobile phones and then exporting the dump file to restart them on emulators. Our system was fooled. So we decided to add “emulator detection checks” (subsection 5) to analyze the user’s behavior at various key points. However, for most app publishers (depending on your business model) this type of fraud is not particularly common.

8. Faking affiliate programs

Each of AppLike’s apps offers users an affiliate program through which to invite friends to use that specific app. For example, users get a signup bonus for every invited friend that uses an AppLike app. In particular, users get lifetime rewards if their affiliate/friend is themselves collecting coins for playing games or apps. On the one hand this helps AppLike to grow its organic reach in a very cost-effective manner; on the other hand it attracts fraudulent users to try to simulate those affiliated friends to earn more coins. We came up with two mechanisms to prevent this:

  • Finding specific abnormalities in those affiliate groups. For example, there are often similarities across the email addresses, or some device model names may recur frequently. A notably high number of affiliations can also be a significant indicator. All this data is put into different rule sets.
  • AppLike’s legitimate user acquisition anti-fraud system is also useful for detecting affiliate fraud, because affiliate attribution logic is very similar to marketing user acquisition attribution. If somebody tries to generate tons of fake installs, it doesn’t matter if the source is a marketing network or an affiliate.

We are delighted that our anti-fraud activities and ongoing anti-fraud operations seem to pay off. Singular named AppLike one of the top 20 anti-fraud media sources for app publishers, and Appsflyer named us as a top player in their Power Index for North America and Europe.

If you’re feeling overwhelmed by all that, I’d like to sum up with six easy-to- remember tips to help you avoid fraudulent losses:

6 basic anti-fraud recommendations

  1. Double check suspicious behavior using various solutions, third party providers and even real people! One solution is not enough, unfortunately.
  2. Don’t trust marketing promises that simply say, “we handle fraud really
    professionally”. Ask for real examples of how your business partners approach fraud.
  3. In most cases, mobile analytics and attribution providers don’t offer enough security with their solution, even though they sell it that way.
  4. Trust your gut feeling: If numbers look too ambitious, be careful! Sometimes easy solutions can combat many fraud attempts without much effort. Of course, machine learning algorithms are nice, but do you have someone in your team who can easily put them in place?
  5. Programmatic traffic via DSPs etc. is the biggest challenge in the ecosystem.
  6. Finally: Talk to the real nerds in your company! If you don’t have them, hire some!

Got until here and would love to work with us on the next big fraud and ad-tech challenges?

Drop me a line carlo@applike.info