One spring afternoon I was having lunch with Nick Briz at a small neighborhood diner near our studio in Chicago. We were throwing around ideas for an upcoming conference in Brooklyn that we’ve been participating in for the last few years called Radical Networks. The event brings together artist, educators, journalists and activists from all over the world to foster discussion and engagement with topics of communication networks and Internet infrastructure through workshops, performances, invited speakers, and an art show.
What if websites borrowed compute resources from their visitor’s devices while they browsed as a means of distributed computing?
We’d both participated in the art show since the festival’s inception, but this year I felt compelled to break into the speaker track. In particular, I was entertaining the idea of presenting about an idea I’d had a few days prior, “what if websites borrowed compute resources from their visitor’s devices while they browsed as a means of distributed computing?”
This post is a report of my trip down this rabbit hole of an idea, and a summary of the talk that I ended up giving at Radical Networks as a result of that research.
Stepping Back, A Bit About Distributed Computing
Before we go too deep into the implications of borrowing user’s compute resources while they unsuspectingly browse the web, I want to touch on why it would be advantageous to do so in the first place. The example scenario that I’ve posed falls into a field of computer science called Distributed computing. Distributed computing is the practice of dividing a problem into small chunks and running it on many different computers in parallel, significantly reducing the time needed to compute the problem. In general, distributed computing offers abundant compute resources like many CPUs, high network bandwidth, and a diverse set of IP addresses. For some tasks, distributed computing provides the opportunity for 1,000 computers to work together to solve a task 1,000x faster than it would take one computer to solve that same task working alone.
Distributed computing has a rich history that dates back to ARPANET in the 1960s, with a slew of community and volunteer citizen science projects popping up in the late-1990s and early-2000s (partially thanks to the Berkeley Open Infrastructure for Network Computing, or BOINC software). Projects like SETI@Home, Folding@Home, GIMPS, and many others which allow computer users to donate idle time on their computers to cure diseases, study global warming, find large prime numbers, search for alien life, and do many other types of scientific research.
A botnet is a distributed compute network where the owners of the participating computers don’t know that their computers are participating in the network.
Opposite the idea of volunteer distributed computing is the concept of a Botnet. A botnet, the portmanteau of “Robot” and “Network”, is a distributed compute network where the owners of the participating computers don’t know that their computers are participating in the network. They are associated with hacking and criminal activity and are best known for their use in nefarious activities like distributed denial of service (DDoS), e-mail spamming, spyware, click fraud, and more recently, cryptocurrency mining. Botnet software is usually installed on a user’s machine as a trojan or worm and can persist for months or years without the owner knowing, all the while providing compute cycles and bandwidth to an anonymous third party. Occasionally these botnets grow in size until they control tens of millions of unsuspected user’s computers and become informally recognized and named by members of the cybersecurity community.
Browser Based Botnets
A bit of digging revealed that this wasn’t a particularly new idea, and that folks had been talking openly about this technique since at least 2012. MWR Labs conducted research on the subject applied to distributed hash cracking on the web (an idea that I elaborated on in a demo during my talk, code here) and Jeremiah Grossman and Matt Johansen had a great talk at Black Hat USA in 2013 on the subject. Both research groups distributed their experiments to unsuspecting users in a notably devious and ingenious way: ad networks.
Traditional methods of distributed computing involve volunteers or viruses, but the landscape is quite different for browser-based botnets. With our approach, we need to distribute our code to as many web browsers as possible at once. We have a few options:
- Run a popular website
- Write a Wordpress/Tumblr theme and embed our malicious code in the source
- Run a free proxy server (or TOR exit node), and inject our code into non-HTTPS traffic
- Be an ISP and do the same ^
- Embed our malicious code into popular websites with persistent cross-site scripting (XSS) (illegal)
- Buy a banner ad
Here’s the idea: advertising networks connect web content publishers (i.e. blogs, news sites, porn sites, forums) to advertisers. Advertisers pay the ad network per click (CPC) or per impression/view (CPM). The network scrapes money off the top before sending it along to the publishers who host the ads on their platforms. When an advertiser creates a dynamic third-party creative (a fancy name for an embeddable
Doing it Anonymously
Given that researchers had luck exploiting these techniques five years ago, I was curious if it was still possible to do so today, or if browsers and ad networks had wised up to these kinds of shenanigans. In preparation for my talk I found an ad network that supported iframes and wrote some pseudo-malicious bots. My goal was to survey the landscape and see what was possible in this domain, specifically utilizing some of the more modern web browser technologies that have evolved since 2012.
The bots that I was writing worked by communicating with a central command-and-control server that would coordinate the compute nodes and distribute tasks, log experiment results, etc. For this I needed a cloud server to run my back end Node.js code. Here is where I cheated a bit. There are tons of bulletproof and offshore VPSes available for purchase on the web, most all of which accept Bitcoin as payment. But for convenience, and because as far as I could tell I wasn’t actually doing anything illegal, I chose to use Amazon Web Services (AWS). A nefarious hacker would have no problem finding an anonymous VPS or using someone else’s server that they already compromised.
For added security I wanted to encrypt the communications between my malicious ad bots and the Node command-and-control server, so I also required an SSL/TLS certificate. Let’s Encrypt provides them for free, but like all SSL certificates, you need to own a domain name to get one. Fortunately, Namecheap.com recently announced a new Bitcoin payment method, so equipped with my anon email address, I created an account and registered a $0.88 “.website” domain paid for in Bitcoin.
Before I deployed the first ads, I wanted to configure some sort of analytics tracking to gather information about the types of users the ads were served to. I was primarily interested in geographic location as well as simple time-on-page and recurring visitor statistics. Google Analytics is the standard analytics tracker, but that doesn’t fit very nicely into my anonymous pipeline — plus, I’d rather not feed the Google beast. Matomo (formerly Piwik) is an open source analytics alternative that can be self-hosted on your own server.
The popunder.net advertising network offers minimum CPM (“cost per milli”, or price for 1,000 impressions) ad buys for $0.04, so I was able to conduct all of my experiments on a budget. All together, I spent less than $100 running ads intermittently over the course of one month.
What would you do with 100,000 web browsers and an afternoon?
The first ad simply logged IP addresses, user agents, and visit duration. The ad started running at 9AM CDT on a Thursday right before heading to work. I ran the ad for ~3 hours, turning it off around lunch time to analyze some of the results.
I was shocked to see that the ad had been served to 117,852 web browsers from 30,234 unique IP addresses. Surprisingly, a significant portion of the visitors stayed on the page serving the ad for quite a while, which could provide sizable CPU clock time. Some clients even reported back to the command-and-control server over 24 hours after the ad network had stopped serving the ad, meaning that some poor users still had the tab open. Including these outliers, the average time time on ad was 15 minutes!
I summed the number of seconds that all browser clients ran the code served by the ad and the total added up to 327 days. That’s the equivalent of one computer running my ad on one web browser for nearly a year, all in just three hours real-time for just around $15 USD of Bitcoin. Hot. Damn.
So this whole thing worked; an ad network turned out to be a brilliant method of distribution. But how powerful was this network? Compared to say, the beefy 4.2GHz CPU of the machine that I was using to develop it? To test this I wrote a hashing bot that calculated the SHA1 hash of random numbers in an infinite loop as quickly as possible.
The speed of the network offered a 100x increase from my home workstation for a nominal cost.
The browser clients that received this ad had 3.67 CPU cores on average, boding well for the possibility of multi-threaded exploitation in-browser. Collectively, the SHA1 botnet averaged 324 concurrently connected clients hashing 8.5 million SHA1 hashes per-second as an entire network.
Monero Miner Bot
While conducting this research, I also found myself conducting, *ahem… cough, cough*, other research on The Pirate Bay 🏴☠️. I happened to have my system CPU monitor open because I was testing some botnet code a few minutes before and I noticed something peculiar. When I opened certain links on The Pirate Bay my CPU usage would spike to ~80% on all cores. When I navigated away from those links the usage would fall. Had I found an instance of the very abuse that I was studying live in the wild?
I profiled the suspicious pages using the Firefox developer tools and noticed there were six dedicated web worker threads running a script called
Digging deeper, I found a file called
coinhive.min.js. Some Duck Duck Go’ing lead me to coinhive.com. Coinhive appeared to be a company that was offering an alternative method of monetization on the web. Sites could use Coinhive to embed XMR miners into their web pages that would borrow their user’s CPU instead of serving them advertisements. This is fairly unprecedented as far as I know and Coinhive appeared to have just been launched the week before. In fact, first reports of it being used by The Pirate Bay didn’t even start to make waves on the net until the day after I stumbled across it.
The timing of Coinhive coinciding with my research was impeccable and the interest that it sparked on the web was encouraging. I created an ad that ran a Coinhive.js miner and ran it for an hour and fifteen minutes. I was able to mine the equivalence of $4.20 🌲 in XMR at the time (~$3 after Coinhive’s cut), although the ad itself cost nearly $10 to run. The price of Monero has jumped ~300% since then so this method may now be approaching profitability.
Botnets are most associated with distributed denial of service (DDoS) attacks. Botmasters use thousands of machines under their control to flood target servers with enough Internet traffic to render their services unusable or rent access to their botnet for others to do the same. Would the popunder.net ad network give me enough concurrent users to perform a DDoS against one of my own servers?
I rented another t2.micro AWS server and installed stock Nginx to serve a boilerplate website accessible on the net. I then launched a DDoS bot on the ad network that made concurrent HTTP requests to my Nginx server as quickly as possible in an attempt to knock it offline. Nginx was able to handle the ~22K requests per second generated by the bots. The service seemed to operate regularly during the attack which directed 9,850,049 1KB GET requests sent from 12,326 unique IP addresses.
I had similar results with an Apache 2 server I set up. The default Apache server was able to fend off the bots and handle an average of ~26K requests per second. Both Nginx and Apache did use ~60–100% of their single CPU during the attack.
While the attacks didn’t work in rendering the services unusable (which is actually pretty relieving) I was able to generate a 5.3GB Nginx logfile in just over an hour. The standard AWS micro instance has 8GB of storage, so it would likely be trivial to fill the entire disk of small websites that have the default logging behavior enabled for only a few dollars.
This is only speculative, but the t2.micro instance provides low network bandwidth and speed in comparison to their more expensive servers, which may have actually throttled the rate that traffic could reach the server. I haven’t run the experiments on a larger instance, but it is possible that attacks would actually be more effective against servers with more network resources. AWS servers are also known for being stable against DDoS attacks, so perhaps attacking a VPS hosted on another platform would be more successful.
Finally, the bot I’m most excited to share — the Web Torrent bot. A few years ago a new protocol for peer-to-peer networking communications was introduced in the browser called WebRTC. WebRTC allows web browsers to exchange video, audio, or arbitrary data with each other directly without the need for a third party server to shuffle the information back and forth. A few engineers quickly implemented the popular BitTorrent protocol over WebRTC and WebTorrent was born. WebTorrent allows users to seed and leech files with hundreds of peers entirely through their web browsers. While this technology brings a wealth of opportunities for distributed networking to the web it also comes with some significant security concerns. Torrents can be downloaded and uploaded in the background of web pages unbeknownst to users, which can become particularly problematic if the content is illegal or otherwise unwelcome.
The entire network uploaded a whopping 3.15 TB of data in a single day.
To measure the potentials of such activity I created a torrent of 1GB of random noise data to seed entirely through the ad network. Users that were served the ad automatically download the 1GB file from other users that also had the ad open in a browser tab. The health of the torrent was determined by the number of connected clients at any given time.
The ad ran for 24 hours reaching 180,175 browser clients from 127,755 unique IP addresses. 328.5 KB were uploaded every second by each browser on average, leading to a 702 Mbps upload speed for the entire network.
Clients had an average seed ratio of 2.24 (106.18 max) and uploaded 25 MB of data each (69.28 GB max). The entire network seeded (uploaded) a whopping 3.15 TB of data in a single day.
WebRTC doesn’t discriminate against metered or cellular network connections. I configured the ad network to only target desktop devices when serving this ad but there is nothing stopping a malicious actor from using hundreds of Gigabytes of network data from your cell phone over an LTE connection and racking up a $10,000 phone bill in the process.
I had my fun. Launching a series of research botnets in unsuspecting user’s browsers was pretty close to an all-time high (all in the name of science of course). I never had any intention of abusing strangers on the web or profiting from these endeavors in any way. My ads were limited in scope and duration and I did not expose the IP addresses or other identifiable information of any of the *victims* of the experiments. I sought out to answer a few unsettling questions about the state of the web, the browser, and Internet advertising in an attempt to publish my findings in the open and encourage public discourse about browser based botnets. What I found was honestly horrifying, and I didn’t even tread into some of the deeper waters of modern web technologies.
2017 brought support for WebAssembly in all major browsers and the opportunity for near native speeds of compiled bytecode running in a multi-threaded(-ish) environment with Web Workers. WebGL and the capability of general purpose GPU computing (GPUGPU) with OpenGL shaders, GPU.js and Deeplearn.js offer hardware-accelerated parallel programming in the browser, ripe for the exploitation of unsuspected user’s tabs.
There is no doubt more research to be done to better understand the threat we may already be facing in our web browsers and will continue to face in the future. The techniques that I’ve demonstrated in this post are less of an exploit and more a feature of how the web inherently works. As a result, the steps that can be taken to defend yourself against the type of abuse I’m proposing are somewhat limited. My first suggestion is please, please, please BLOCK ADS. If you’ve somehow made it all the way to 2018 without using an ad blocker, 1) wtf… and 2) start today. In all seriousness, I don’t mean to be patronizing. An ad blocker is a necessary tool to preserve your privacy and security on the web and there is no shame in using one. Advertising networks have overstepped their bounds and its time to show them that we won’t stand for it.
Blocking ads defends you from the distribution mechanism that we discussed in this post, but you are still vulnerable to code that is hosted by CPU greedy websites themselves, like The Pirate Bay. The best suggestion that I have for defending against these threats at the moment is to diligently monitor your computer’s CPU usage as you browse, responding to CPU spikes and irregularities as you deem fit. Its a good habit to get into to have your system monitor open during regular computer operation so that you can observe CPU and network usage of your machine at an application level.
In closing, I’ll leave you with a hypothetical situation — An attempt to loosely answer a question posed at the beginning of the post. What would happen if major websites borrowed CPU cycles from their users while they browsed their sites much like I did with advertising bots? How much free compute might they be able to extract?
Google, YouTube, and Facebook are the top three most visited websites on the Internet according to 2016 Alexa rankings. Google.com (the search page itself, not all of the products offered by the company), receives 1.5 billion visitors a day with an average 8 minutes per-visit, or 22,831 years of “browser time” daily. Given the statistics I collected from ~30,000 samples in one of my advertisements, lets assume each device has ~3.5 CPU cores. That makes Google’s estimated free-daily compute resources equivalent to one CPU running 24/7 for 79,908 years. People would pitch a fit if Google.com greedily used 100% of their CPU resources, but would they notice if they used a mere 10%? Doing so would still yield nearly 8,000 years of compute each day. And remember, that’s not the power of Google’s server infrastructure, but rather, a loose estimation of the amount of free compute they could exploit from their user’s devices entirely for free by virtue of their site’s popularity. Minus, of course, the astronomical legal fees that could come with actually doing it when the public found out about it.
If you are interested in learning more about this research, a recording of the Radical Networks talk is available to watch on YouTube. A copy of the slides are also available as a PDF on my website. You are welcome to use any resources from this post, the recording of the talk, or the slides in your own work (CC BY-SA).
The information contained in this post is to be used for research and education purposes only. I do not condone its use for illegal purposes. Don’t be a dick.