A Look Into Honeypot Data
I’ve been collecting data from honeypots for several years. During that time I never really did an in-depth analysis of the data. I do generate and publish high-level stats (published here), and every now and then I’ll casually browse the data and stumble onto some interesting payloads (one example). However, the data I’ve reported on and make available via API is focused on top targeted services and malicious IPs. While that high-level information has good value, there is much more data that can be used to produce additional metrics and extract useful information from. In an effort to dive further into the data I’ve started to apply additional parsing of payloads for various services. In this post, I’ll share data from HTTP, HTTP-Alt, WebLogic, and SIP services, and I plan to publish subsequent posts for other services.
A general note about the results, I still would not consider this an in-depth analysis. This just an initial pass at the data, even the scripts I wrote to extract the data are relatively crude. The output is fairly raw, but I’ve modified references to the destination IP addresses (the honeypot sensor) to x.x.x.x. Over time I’ll improve the extraction and analysis of the data.
As I mentioned, the focus in this post is on HTTP, HTTP-Alt, WebLogic, and SIP services (while SIP is not an HTTP protocol, SIP requests are very similar to HTTP). To keep it simple, I focused on the first line of the request payload, which if well formed will contain 4 components: Method, URL, Query String, and Protocol Version. Below are example requests, notice the SIP request has basically the same structure as HTTP.
Example full HTTP request (HTTP-Alt & WebLogic are similar):
GET /test.php?query_parameter=test HTTP/1.1
Example SIP payload
INVITE sip:email@example.com:5060;transport=udp;user=phone SIP/2.0
Via: SIP/2.0/UDP 126.96.36.199:5060;branch=z9hG4bK-688856-1---swc2hkgz04oz8ezx;rport
CSeq: 1 INVITE
Allow: INVITE, ACK, CANCEL, BYE, NOTIFY, REFER, MESSAGE, OPTIONS, INFO, SUBSCRIBE
User-Agent: Alcatel-Lucent 5060 MGC-8 188.8.131.52
Allow-Events: presence, kpml, talk
For each service, I generated a list of values for each of the four components. There is also a fifth list, malformed requests, which contains requests that did not parse cleanly since they were not a properly formed request. In most cases, this means there were extra spaces in the first line or less than the expected four components. These malformed requests appeared to be a result of fuzzing, injection payloads, encoded payloads, or scanners probing any port regardless of the intended service.
This data consists of all requests to port 80 (HTTP) and port 443 (HTTPS).
See all HTTP method data here.
The path is a good indicator of what application is being targeted. It’s clear that phpMyAdmin is highly targeted. As you review that data you’ll see many more targeted apps.
See all HTTP path data here.
The empty value in row two are requests where no query string was provided.
In row three, notice the parameter indicates an attempt to execute commands on the server. If executed, it would result in downloading and executing a script.
See all HTTP query string data here.
Note rows seven and ten are blank. My assumption is at least one of those is not actually blank, but contains a non-printable character.
See all HTTP protocol versions data here.
Note, each value is enclosed in [‘ ‘]. This is an artifact from the Python script I used to parse requests.
The top ten are not so interesting. To find more interesting payloads review all the malformed requests data — link below.
See all HTTP malformed requests data here.
This data consists of all requests to port 8080.
See all HTTP-Alt method data here.
See all HTTP-Alt path data here.
See all HTTP-Alt query string data here.
See all HTTP-Alt protocol version data here.
See all HTTP-Alt malformed requests data here.
This data consists of all requests to port 7001.
See all WebLogic method data here.
See all WebLogic path data here.
See all WebLogic query strings data here.
See all WebLogic protocol version data here.
See all WebLogic malformed requests data here.
This data consists of all requests to port 5060.
See all SIP method data here.
See all SIP path data here.
See all SIP protocol versions data here.
See all SIP malformed requests data here.
While there is no ground breaking findings in this data, the data does provide some basic patterns of what to expect will hit your web applications or SIP (VOIP) services. In a sense, this establishes the normal noise. By continuing to monitor the expected noise, any new patterns (e.g. probes or attacks) will quickly stand out. Those new patterns should prove to be more interesting as they may reveal new attack targets or exploits.
In future posts I’ll expand this analysis to the contents of HTTP POST requests. The content or body of POST requests will contain more interesting payloads. I’ll also expand analysis to services that are aligned with IoT. Stay tuned.
How Can I Do This?
If you are interested in running your own honeypot to capture data and perform your own analysis, I have a few tools for you. HoneyPy is a low to medium interaction honeypot that can be configured to report data into HoneyDB or several other destinations. An option to running HoneyPy is the HoneyDB Agent, more details on getting started here.