Web Scraping Fraud: Going, Going … Ongoing.

Despite rising awareness, web scraping remains a commonly used fraud technique. Bots are powering new attacks, and mobile is increasingly vulnerable.

Published in

DataVisor

5 min readSep 10, 2019

The Cambridge Analytica scandal gave new visibility to malicious web scraping, but the practice remains a misunderstood one. This is due in large part to the fact that web scraping isn’t always criminal — there are legitimate uses for the practice as well. This makes it extremely hard to differentiate between legitimate and malicious instances. Of additional concern is the reality that web scraping — despite the increasing awareness — is actually becoming more of a concern, rather than less. As more fraudsters realize value from the practice, and more—and less expensive—tools become available, the technique becomes more prevalent.

Regrettably, Facebook remains at the center of ongoing controversies. The Cambridge Analytica story “officially” broke in March 2018, and yet here we are again, barely a year-and-a-half later, with Facebook and web scraping back atop the headlines. Last week, news emerged that a vast amount of users’ phone numbers — estimates put the amount at roughly 419 million — were exposed as the result of web scraping fraud.

Facebook’s statement (as reported by CNET) on the matter makes clear the ongoing challenge of fraudulent web scraping:

“Web scraping is an industry-wide challenge and it continues to be difficult to prevent and often hard to detect once it’s happened. In this case, as we announced in April 2018, people’s publicly available phone numbers were scraped in violation of our policies. That is why we removed the ability to find friends using their phone number, because we learned that malicious actors abused this feature. As we said at the time, ‘given the scale and sophistication of the activity we had seen, we believed that most people on Facebook could have had their public profile scraped in this way.’ Since then, we’ve also been making changes to our platforms to reduce the risk of scraping.”

What IS Web Scraping Fraud?

As detailed in DataVisor’s Fraud Wiki, web scraping is the automated extraction of large volumes of data from web pages and applications. Common tools used in the practice include bots, off-the-shelf or custom-coded scripts, and third-party scraping services. As noted above, there are both legitimate and fraudulent forms of scraping. Using a bot, for example, to scrape web page data for future inclusion in search engine listings is considered a legal and valid form of scraping. However, using a bot to scrape images and text from social media sites to create legitimate-seeming fake accounts is illegal.

Bot-Powered Web Scraping

As reported by Forbes, bot attacks are on the rise, particularly on mobile. Citing numbers from recent ThreatMetrix research, the article notes that “companies were hit by 3 billion automated bot attacks in the last six months of 2018” and highlights the fact that “189 million bot attacks originated from mobile devices.”

Claire Zhou, writing for DataVisor early in 2019, highlighted this problem as well, when she noted that “fraudsters are constantly finding ways to commit fraud through the mobile channel and they are attacking industries including financial services, marketplaces, social commerce, and gaming with increasingly sophisticated techniques. The key point of vulnerability is at the user acquisition stage — app installs and new account creation.”

Fraudsters use bots to increase the speed and scale of their criminal activities, and to obfuscate their actions. In recent times, as reported by DataVisor, “fraudsters have intensified their attacks by turning to a new and more advanced type of bot called an advanced persistent bot (APB). APBs are capable of multiple obfuscation techniques, including realistically emulating human behavior, dynamically rotating IPs, and distributing fraud attacks across thousands of IP addresses.”

Types of Fraud Attacks Made Possible By Web Scraping

Fraudsters use web scraping techniques to execute several different attack types. These can include:

Application Fraud: Scraped personal data from social media sites can consist of everything from street addresses to occupations. This information can be used to complete and submit fraudulent loan and credit card applications.
Fake Product Listings: Data scraped from online marketplaces (e.g., Amazon, eBay, Overstock.com) can be used to create thousands of fake, deceptive, and malicious product listings on peer-to-peer (P2P) marketplaces such as Craigslist, OfferUp, and Wallapop. These listings can, in turn, be used to scam money from buyers, phish for personal information, or trick buyers into purchasing counterfeit products.
Digital Ad Fraud: Bots can be used to scrape the content of ad publisher sites and create fake ad publisher pages on other servers to create new ad slots. Fraudsters can then sell these new fraudulent ad slots to authorized resellers that are listed in the original publisher’s ads.txt file. The ad slots, built using scraped content, will then appear to be from a legitimate ad publisher.

Stages of Web Scraping Fraud

Stopping malicious web scraping is all about proactivity. The earlier you can address it, the less damage you’ll have to repair. Anecdotally, this is advice Facebook could have used in 2015, when “staffers in its political advertising unit” first raised flags about Cambridge Analytica!

As with any data-driven fraud attack, there are stages, and there are degrees to which detection and prevention are possible at any given stage. With regards to web scraping, consider the following stages:

Data acquisition: During this stage, a fraudster is busy acquiring the data they’ll need to execute their envisioned attack. Accurate fraud detection is very difficult at this stage because there are legitimate use cases for scraping data, so it’s virtually impossible to distinguish between legitimate and malicious scraping.
Attack preparation: At this stage, the fraudster has obtained the necessary data via scraping, and is now laying the groundwork for using that data in a pending attack. At this point, the actions are now illicit, and the steps the fraudster takes during this stage leave a digital footprint that — while likely subtle and cleverly obfuscated — sophisticated fraud management solutions such as DataVisor’s dCube can reveal through holistic data analysis.
The Attack: This is where scraped data is actively used for malicious purposes. Detection is still possible during this stage, but it’s reactive because the attack has already launched. Damage control is the only remaining option.

Stopping Web Scraping Fraud

Fraudsters rely on bots to perform web scraping at scale, and it is this reliance on coordinated actions that enables us to reveal their plots. Using holistic data analysis, we can expose those actions that — while appearing innocuous in isolation — are part of coordinated malicious efforts. By surfacing these revealing patterns, we can flag potentially fraudulent accounts and actions before they cause harm.