Hadoop YARN: Assessment of the Attack Surface and Its Exploits

Pascal G.
HackerNoon.com
22 min readNov 15, 2018

--

TL;DR

  • Rate of Hadoop YARN exploits is slowing but still at a concerning 350,000 events per day
  • 1065 servers are exposed and vulnerable
  • The geographic spread of vulnerable servers and the targets of the attacks is global and concentrated in regions with high cloud data center densities
  • Motivations behind the exploits range from planting Linux backdoors, infecting servers with IoT malware for scanning and DDoS, up to cryptomining campaigns
  • A Monero cryptomining campaign has been actively abusing exposed Hadoop YARN servers since April 2018 and mined for a total revenue of 566 XMR (about 60,000 USD) and is growing its revenues with an average of 2 XMR (212 USD) a day
  • In a window of less than 14 days, there was enough malware collected from Hadoop YARN exploit attempts to start a small zoo
  • Owners of Hadoop YARN servers should care, as they can fall victim to cryptomining abuse, causing loss of performance, instability and higher cloud utilization bills
  • Online businesses should care, too. They can be the target of DDoS attacks.
  • Consumers should care because they will not be able to shop during Cyber Monday if their favorite online shop falls victim to DDoS attacks

In my blog on DemonBot, I discussed how Hadoop YARN exploit attempts were ramping up. In the middle of October, our deception network recorded up to 1.5 million attempts per day. The good news is that the attempt rate steadily slowed down in the second half of last month — though unfortunately not to the point where we should pat ourselves on the back for exposing one of the many malicious campaigns that are taking advantage of exposed Hadoop YARN servers.

These last few days, the number of Hadoop Yarn exploit attempts slowed to an average of 350,000 attempts per day. That said, there is no sign of the threat going away any time soon and we should stay alert. In order to appreciate the risk and quantify the threat, I have been tracking Hadoop YARN campaigns and exploring the extent of the attack surface since my last blog. Understanding the potential for abuse and the types of threats that are emerging from the exposed servers allows one to better appreciate the risk.

The Attackers and their Victims

Between September and the first half of November, there have been more than 35 million exploit attempts registered by our deception network and over one-third of them originated from the US. Great Britain, Italy and Germany are the runners-up and, combined, they were good for more than half of the exploit attempts.

In absolute numbers, the U.S. generated nearly 12 million exploit attempts. Great Britain and Italy each were responsible for 6 million attempts, closely followed by Germany with 4.8 million attempts.

The exploit attempts were not specifically targeting a single region. The UK and Germany honeypots were hit twice as hard compared to the rest of the world. The average numbers for each region is between 1.6 and 3.2 million attempted exploits.

Hadoop YARN Attack Surface

To asses the attack surface, I performed a global scan for services listening on the Hadoop YARN port TCP/8088, taking care to exclude sensitive IP ranges as listed in Robert Graham’s masscan exclusion list. By November 8, the number of vulnerable Hadoop YARN servers exposed to the public was 1065. The vulnerable servers are scattered around the globe with higher concentrations in areas where the data center density is high.

Compare the above locations of vulnerable Hadoop YARN servers with the global data center map below:

The attack surface is global and limited to little over 1,000 servers, but it should not be ignored because of the high potential powerful big data servers typically provide for malicious agents.

Types of Abuse

Now that we have a good measure on the attack surface and the interest taken in it by malicious actors, it’s time to have a closer look at how these actors are attempting to take advantage of this situation.

The below graph shows different Hadoop YARN exploits recorded by our medium interaction honeypots over a period of 14 days. Each exploit payload contains a command sequence which is hashed into a unique fingerprint, allowing us to quantify and track campaigns over time. The exploit table in (*1) contains the details of each command sequence corresponding to the fingerprints in the graph.

The red bars in the command sequence graph above represent the attempted count per day from a new DemonBot campaign ‘YSDKOP,’ named after the names used for the malware binaries.

The two large peaks in different shades of blue represent multiple exploits related to a Hadoop YARN cryptomining campaign that has been running for at least 8 months now; first spotted in April 2018, it recently moved its download infrastructure to BitBucket.org. Guess it is more convenient to track different versions of cryptominer and its configuration files over time using Atlassian’s free and public service…

The other, shorter and less aggressive campaigns represented in the command sequence graph above were mostly infection attempts by Linux/IoT Botnets. Some that seemed worthy of a few words are discussed below.

The Bitbucket Crypto Miner

An ongoing Monero cryptomining campaign that has been known to actively abuse exposed Hadoop YARN servers since April of this year, mined a total of 566 XMR (about 60,000 USD) and is growing its revenue with an average rate of 2 XMR (212 USD) a day. The malicious agent or group is currently abusing three servers and maintains an average hash rate of 400kH/s over time.

Leveraging the Hadoop YARN vulnerability, a shell script is downloaded and executed from a public BitBucket account:

The ‘zz.sh’ script, archived in (*2) for reference, performs some cleaning up on the server before ultimately downloading a binary called ‘x_64’ from the same repository.

The x_64 binary is XMRig, an open source, high performance Monero CPU miner written in C++ (https://github.com/xmrig/xmrig).

The configuration file for XMRig is ‘w.conf’ and downloaded from the same BitBucket repository:

From the configuration file we find the pool wallet address:

The wallet address matches that of operations reported in the Stackoverflow and HortonWorks communities by Hadoop admins in May of this year; thousands of cryptomining jobs were causing issues with the cluster.

In August, the 360 Threat Intelligence Center published a report on what they called the “8220 mining gang,” also mentioning the same wallet address. According to the researchers, the mining gang was/is suspected to be of Chinese origin.

The same address also matches the wallet address used in a sample Nanopool report link in the readme of another cryptomining open-source software hosted on Github and called ‘Cpuhunter’.

The Nanopool wallet account that has been in use since April 10 can be tracked through this link.

The total XMR payments resulting from this illegal mining operation were, as of November 12, 566 XMR or about 60,000 USD.

YSDKOP, DemonBot in Hiding

YSDKOP bots are delivered through a Hadoop YARN exploit using the following payload:

The downloaded ‘bins.sh’ script downloads in its turn several binaries in a typical IoT loader kind of way:

The different binaries correspond to cross-compiled versions of the same source code for multiple platform architectures:

A quick glance over the strings of the i586 binary reveals the typical DemonBot markers:

This is an unaltered DemonBot hiding behind a random name YSDKOP.

Supra, DemonBot-ng

Infecting through the Hadoop YARN exploit payload below:

The downloaded script ’n’ contains code to download two binaries, one 32bit x86 and one 64bit x86:

Looking at the strings of the downloaded ‘Supra.x86_64’ binary, we see a close match with those of DemonBot, as do the decorated names in the unstripped binary.

Note the very similar string as previously discovered in the DemonBot source code, but this time with ‘Supra’ instead of ‘shelling’ in the first square brackets:

The new binary also contains indicators of an extension in the platform detection code. The original DemonBot checked for two platforms

  • Ubuntu/Debian, based on the existence of /usr/bin/apt-get, and
  • RHEL/Centos, based on the existence of /usr/bin/yum

Supra adds to the above two:

  • Gentoo: /usr/lib/portage
  • OpenSUSE: /usr/share/YaST2
  • OpenWRT: /etc/dropbear
  • UNKNOWN: /etc/opkg
  • Dropbear: /etc/ssh/
  • Telnet: /etc/xinet.d/telnet

The compile version used for this DemonBot version is identical to the original DemonBot: GCC (GNU) 4.2.1.

Hoho, a Botnet by Greek.Helios

Hadoop YARN exploit payload:

The binaries first appeared on the server on Oct 30, 2018:

The hoho.x86 binary contains the literal string:

The binary is packed with the UPX executable packer and matches mostly Mirai code.

prax0zma.ru

This campaign consists of a set of shell scripts which deletes system and other user accounts from a compromised server and creates two backdoor accounts with root privileges.

The backdoor account user names are ‘VM’ and ‘localhost’ and both have their password set to the hash ‘$1$OwJj0Fjv$RmdaYLph3xpxhxxfPBe8S1’.

A Malware Zoo

The Hadoop YARN exploits in table (*1) provided for a real Linux IoT malware zoo — most of the binaries are Mirai- related — not to our surprise…

Links that are still active:

Links that are inactive as of this writing:

Compromised Servers

Knowing the exposed servers, we can assess the activity of that set of servers that were compromised by correlating the server IP with our global deception network activity. Less than 5% of the list of exposed servers overlapped with servers in our deception network and has been seen performing malicious activity. This 5% is not the full picture though, since there is convincing evidence of actors actively abusing the servers for mining cryptocurrencies and because there is no scanning or exploiting activity, these servers do not show up in our deception network. The amount of compromised servers from the potential 1065 is still an unknown, but it is safe to say that at some point, all of those will fall–or have already fallen–victim to malicious activities.

The below graph shows the activity per port of known compromised servers. The activities target TCP ports 23, 2323, 22, and 2222 which are representative for your run-of-the-mill IoT exploits through telnet and SSH credential brute forcing. The other notorious port 5555 is known for TR069 and ADB exploits on IoT vulnerable devices. In the past 7 days, we witnessed an increased scanning activity targeting port 23.

This Mirai-like port 23 scanning behavior was mostly originating from a single server, good for over 35,000 scanning events during the last 7 days. The other compromised servers were good for a couple of events during limited time ranges.

In terms of regional targeting by compromised servers, Germany took most of the hits.

When…Not If

Although there is clear evidence of DDoS capable botnets attempting to compromise Hadoop YARN exposed servers, there was no immediate evidence of DDoS activity by the compromised servers. This does not eliminate the possibility and potential of DDoS attacks, however. The attack surface is just a little over 1065 servers. Compared to IoT botnets, who can run in the hundreds of thousands of devices, this seems of little threat. However, Hadoop (and cloud servers in general) provides much better connectivity and far more compute resources compared to IoT devices; only a few of these servers in a botnet can cause severe disruption to online businesses.

For those that are operating Hadoop clusters, a publicly exposed YARN service can and will at some point be exploited and abused for cryptomining. Besides affecting stability and performance, cloud servers with elastic compute resources can have an economic impact on the victim because of the surge in resource utilization.

Do note that you cannot get away with publicly exposed services, it is not a matter of IF but a matter of WHEN your service will be compromised and abused. In today’s Internet, cloud servers can perform full internet port scans in minutes, and application vulnerability scans in less than a day. For those of you who are not convinced yet, pay a visit to one of the (IoT) search engines such as https://shodan.io or https://fofa.so, who on a daily basis scan and scrape internet connected devices. Just type ‘jetty’ in the search field of those search engines and witness how many servers are indexed and easily discovered within seconds.

(*1) Hadoop YARN Exploits

(*2) zz.sh script

Originally published at blog.radware.com on November 15, 2018.

--

--