Yingtong Dou
Jul 9 · 9 min read

In this story, I will introduce my recent research work with Huawei and U. of Rochester on characterizing download fraud in mobile App markets. As far as I know, it is the first research work dealing with this problem. From this story, you will know:

  • Three types of download fraud and their properties;
  • Eighteen features and their contributions to detecting specific fraud types;
  • How fraudsters work on injecting fake downloads and their characters;
  • Five suggestions on mitigating download fraud in App markets and similar platforms.

For more details, please directly refer to our online preprint.


Background

Mobile Application Markets like Google Play and App Store are indispensable parts of today’s mobile Internet and smart devices. They distribute millions of Apps with various functions like social forums, arcade games, and weather tools. The App markets make huge profits, and there are many fraudulent activities involved with them as well.

Searching keywords like “buy App downloads” and “App store optimization” on Google, you will find a bunch of websites providing fake downloads and installs injection service bundles on Google Play or App Store. A recent report from Datavisor, a third-party anti-fraud tech company, claims that near 10% downloads&installs in mobile markets are fake and they cost up to 300 million loss every year.

Besides the fake downloads, other fraudulent activities in App markets like spam reviews, top chart ranking fraud, and malware Apps have been widely studied before. For those problems, researchers could crawl information from the front end, then dissect those fraudulent activities from semantic, temporal, or code-level features.

However, the fake downloads and their behavior information are difficult to be accessed from the front end. Only App markets could acquire such data from server-side logs. Another challenge on fake downloads study is the ground truth since fake downloads are usually mixed with legitimate ones. Moreover, the fraudsters could imitate regular users, which make it is more difficult to acquire the ground truth.

To get a comprehensive understanding of the download fraud in mobile App markets, we utilize the Huawei App Store (the second largest App markets in China). We release a honeypot App on the market and purchase fake downloads from fraudster agencies. Next, I will show how we harness the honeypot, synthesize the information from fraudsters and literature, and uncover the download fraud activities in App markets.


The Honeypot App

The workflow of how we set up the honeypot and track download fraud activities.

Setting up a honeypot is a general approach to acquire the ground truth of attacks in security research. For the download fraud problem, with access to the server-side logs, the honeypot is a naive and efficient way to acquire fraud activities with high fidelity. It also guarantees our study is 100% based on fraudulent activities in the wild where no human prior knowledge are involved.

We launched an isolated gaming App on Huawei App Store from June to August 2018 as the honeypot. We contacted four fraudster agencies via various channels and purchased four fake download injection services for the honeypot. The table below summarizes the attributes of four purchased fraudulent activities.


Three Types of Fraud

According to the information probed from the black market while setting up the honeypot, we classify the download fraud activities in mobile App markets into three categories based on their goals.

Type I: Boosting Front End Downloads

This type of fraud employs automated scripts to click download button from the front end (like App market portal website). It is similar to the click fraud, which is flooded in online advertisement. All fake downloads we purchased to the honeypot App fall into this category since they are very cheap and do not need long-term cooperation, which is different from the other two types of fraud.

The first type of fraud only clicks the download button but without simulating the device information. It only increases the download times at the front end but does not influence back end logs, because it has no download action in reality. Thus, we could filter the fake download from its “Source” and “Device ID” attribute shown in Table II.

Note that the device ID is the hashing code from device IMEI, and our experiment is completely anonymous with no personal information involved.

Type II: Optimizing App Search Ranking

The second type of fraud is more sophisticated than the first one from the technical view. Like the figure below, it employs download bots (physical Andriod smartphones) in device farms shown below.

Device Farms in China (Photo from AppsFlyer)

The fraudsters control multiple devices simultaneously under a unified framework and assign specific download & install tasks to them. According to our investigation, this type of fraud is the most prevalent fraud among all download fraud activities since they usually fabricate very legitimate-like fake downloads, and App markets usually count them as valid downloads.

The ultimate goal of injected fake downloads is to deviate the recommender and search algorithms of App markets since download amount is a crucial feature of those algorithms, which could influence the ranking of an App among Apps with similar functions.

Type III: Enhancing User Acquisition & Retention Rate

The last type of download fraud has no advanced tech involved, but only hire human workers to download and install Apps manually. It is more expensive than the first two types of fraud. However, the crowd workers could conduct some post-install tasks which could not be simulated by bots. Such kind of fraud is defined as crowdturfing.

The fraudster agencies in black marker provide crowdturfing services in App markets, but we could not capture any evidence from the App market server-side logs. Because those downloads are generated from real human beings with their own devices, most of the abnormal activities are inside Apps which could not be tracked by App markets.

We crawl sixteen crowdturfing websites in China and summarize top categories of Apps targeted by crowdturfing fake downloads along with the popular post-install tasks listing on websites.


Eighteen Features

Since Huawei has unique device flag for all Huawei smartphones, we leverage the unique vendor flag as the ground truth of download bots and legitimate devices. Specifically, we consider the Apps whose half of the downloads come from non-vendor devices are suspicious Apps soliciting fake downloads. The legitimate Apps has more 90% downloads from vendor-verified devices.

The figure below shows the eighteen features we selected according to previous literature and our information probed from fraudsters.

Eighteen selected features to identify fake downloads generated by download bots.

Based on the collected dataset, including 10 million download records and half a million Apps, we rank the importance of those Apps calculated by Gini Impurity.

We also validate selected features on a separated testing dataset sampled during a later period. The table below shows the testing performance with XGBoost as the classifier.

From the feature importance ranking, we have the following observations and conclusions.

  • The most informative feature “New device?” indicates that download bots usually reset their device IDs after one download action.
  • App rating and App category are two other top informative features. It reveals that the Apps involved in download fraud activities are different in attributes from regular Apps.
  • Many Apps involved with download fraud are new released Apps. It reflects the intention of App marketers to purchase fake downloads to facilitate their App launching.
  • Except for the first device feature, most of the App download statistics features reveal more signals than device behavior features in identifying fake downloads.
  • App statistics like installations, views, searched times, and client downloads help distinguish the abnormal traffics.
  • Most of the device behavioral features and IP-based features have little contribution to the classification task. It contradicts to our early assumptions, showing that the download bots can indeed simulate regular users’ behavior very well.
  • The total searching times of bots are similar to regular users. It indicates that bots could emulate the searching behavior of regular users.

Analyzing the server-side logs, we also have two interesting findings. The first one is that not all anomalies are fraudulent. We find some Apps with downloads burst during a short period are actual at their promotion phases. From Google Trends, the App searching trends are consistent with their downloads traffic. It indicates that other promotion channels outside App markets may make the traffic spikes.

Another observation is that the categories suffering from download fraud vary with time. For example, a large amount of sports betting Apps involved in download fraud are identified during the 2018 FIFA World Cup. While, in December 2018, there are only a few amounts of sports betting Apps filtered by our detectors.

I omit the comparative analysis between suspicious behavior and normal behavior part in this story, please directly refer to the paper.


Five Suggestions

At the end of our research, we collect stances from three parties related to download fraud in mobile App markets — Fraudster Agencies, App Marketers, and Market Operators. Please refer the paper for the details.

Based on the aforementioned research and investigation, we propose five suggestions from A to Z for operators on mitigating download fraud.

  • Adapting the agility of fraudsters. Due to the continually evolving fraud techniques in the wild, it is better to design an effective and efficient detector, which could detect fraud activities in real time and filter fake downloads immediately.
  • Building behavior signature databases. Our analysis manifests that most of the fraudsters adopt device ID and IP address resetting techniques during injecting fake downloads. Plus the prevalence of cloud services and IPv6 protocols, IP & ID blacklisting is no more a valid method to filter bots and fake devices.
  • Crafting diversified anti-fraud mechanisms. Our investigation in this work reveals the sophisticated intentions behind download fraud campaigns. Beyond this, there are many external factors causing abnormal traffics. Therefore, the anti-fraud system should consider the motivation of app marketers soliciting fake downloads from multiple views. For different fraud type, a personalized detector would capture more fraud activities and reduce the false positive rate.
  • Devising fine-grained advertisement services. A reasonable App promotion mechanism and the advertisement bidding system will attract more App marketers to choose legitimate promotion channels instead of cheating. For example, multi-layered and personalized advertisement pricing would provide more choices for App marketers. Designing better user-advertisement interaction based mechanism could increase the click-through rate (CTR) of advertisement and redirect customers from fraudsters to App market advertising system at the same time.
  • Elaborating clear incentives and sanctions. A strict examination of Apps before they released on App markets could reduce the number of low-quality Apps along with potential fraud activities. Demote the Apps with deceptive activities according to their threat level. It will increase the cost of cheating and thus lower the probability of fraud. It also follows that the proper incentives will divert more app marketers to legitimate promotion channels.

Key Takeaways

Some key points of this story can be summarized as below:

  • Fake downloads have different goals and injecting approaches.
  • Discover anomalies from App download traffic and then impose information from multiple channels to filter the bot-generated downloads.
  • The crowdturfing fraud targeting post-install activities is still a challenge in today’s anti-fraud campaigns.
  • Designing personalized services weighs more than blocking fake downloads.

This story is my first Medium story. I will share more interesting stories about my research work and exploration related to anomaly detection and graph mining.

If you have any questions and suggestions about this story and our research, please feel free to leave your comments. I appreciate it a lot!

Yingtong Dou

Written by

CS PhD Student@UIC. Sharing stories about spam detection, social network analysis and graph mining.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade