Zeltser Challenge Day 1

The Zeltser Challenge

First and foremost, I wanted to thank David Cowen and his team for hosting me in their office yesterday for an impromptu forensic lunch. It’s always a blast to see what that team is developing, and always makes me wish I had more time to explore a new artifact or find a better way to express an idea that’s been bugging me for a while.

I also announced during the lunch my personal embarking on the Zeltser Challenge. In order to help get through each week, I’ve designated certain themes for each day. The goal is not to have an easy out, but rather to help with content generation and focus. Sunday is left as a variable day, and I can say that I’ll be filling that day with sometimes lengthy posts such as book reviews, other times shorter posts about forensic concepts or thoughts on the latest what’s-what in the DFIR community. Comey’s letters would’ve been good Sunday fodder, for example :) Also, thanks to those who sent me a quick note yesterday to help me decide on Monday’s topic!

With that, Saturdays topic is Scripting. This day is to discuss a QaD (“quick-and-dirty”) script that I’ve got laying around or an idea that I wanted to spend the day fleshing out. Most of the discussion will be quick code snippets, with hopefully a fully-developed release every now and then.

Scripting Saturday: IP Geolocation

I probably have 5–6 various IP geolocation scripts lying around that I’ve either thrown together in a pinch because I couldn’t find ${previous_version}, or I was on a different machine and didn’t have access to my repo (yes, sometimes Github is unavailable). For today’s post, I thought it would be good to put some thoughts on IP geolocation scripting down.

Admittedly, IP geolocation is fresh in my mind as Dave Cowen and Matt Seyer discussed this on yesterday’s Forensic Lunch. Matt has put together a really neat capability to parse Windows Event Logs for IP addresses and geolocate that information. I’ll get to the value of this data at the end of the code.

For IP geolocation, I typically rely on the two free databases from MaxMind. MaxMind offers free, downloadable databases, available in both CSV and MaxMind DB binary formats. They are very portable (51.4MB for both MaxMindDBs as of the writing of this post) and easy to script against. The databases offer either city or country granularity, with the City DB obviously containing significantly more data (it accounts for 48.9 of the 51.4MB). The City database even now includes Accuracy Radius Data! All of this, for free.

Not only are the DBs free, but there are also several MaxMind APIs in just about any language you’d want to use. For this QaD script, I’ll be using Python and the binary MaxMindDB file (MMDB file extension). The preface to scripting against this data is to simply download the DBs, decompress them, and know their path so you can reference them.

One note before getting to the code; you may be asking, why use downloadable databases? Won’t your data become outdated? Let me answer the questions in reverse:

  1. You do run the risk of potentially-outdated data. A few approaches to this: download the latest database files if it’s been a while since your last lookup, and if you’ve got something that just doesn’t feel right, then you can always verify against another source.
  2. I prefer to use offline databases for the simple answer of most of the time, I’m performing these lookups against dead logs in a closed environment. I don’t have Internet access, and I’m not moving my evidence to an Internet-accessible source.
  3. There’s an additional argument to be made about not performing lookups against the Internet during an active investigation. We’re talking a lot of correlation to be performed on the lookup server, but it’s not outside the realm of possible. I usually stick to offline sources as much as I can.

MaxMind in Python

There are two Python packages I’ve seen used or primarily used for access to MaxMind DBs:

  • geoip2: Provides access to the GeoIP2 web services and databases. Can be installed using pip or easy_install. More info here.
  • maxminddb: Provides a simple MaxMindDB reader extension. This is a lot “slimmer” than geoip2, and simply performs DB lookups. More info here. This is the package I’ll be using for today’s quick script.

I usually like to begin a script with an error message if the package isn’t yet installed. This helps for either shipping a script to a colleague or reminding myself I haven’t yet done my pre-reqs:

#!/usr/bin/env python
try:
import maxminddb
except ImportError:
print "Could not find the MaxMindDB package. Please installed using \"pip install maxminddb\""

Once the module is imported, looking up an IP address is really simple, per the GitHub REDAME. First, use the open_database function to return a Reader object, which can then utilize the get method to return IP address information. Here’s an example:

>>> ip_info = maxminddb.open_database(‘/path/to/my/GeoLite2-City.mmdb’)
>>> ip_info.get('8.8.8.8')
{u'city': {u'geoname_id': 5375480, u'names': {u'ru': u'\u041c\u0430\u0443\u043d\u0442\u0438\u043d-\u0412\u044c\u044e', u'fr': u'Mountain View', u'en': u'Mountain View', u'de': u'Mountain View', u'zh-CN': u'\u8292\u5ef7\u7ef4\u5c24', u'ja': u'\u30de\u30a6\u30f3\u30c6\u30f3\u30d3\u30e5\u30fc'}}, u'country': {u'geoname_id': 6252001, u'iso_code': u'US', u'names': {u'ru': u'\u0421\u0428\u0410', u'fr': u'\xc9tats-Unis', u'en': u'United States', u'de': u'USA', u'zh-CN': u'\u7f8e\u56fd', u'pt-BR': u'Estados Unidos', u'ja': u'\u30a2\u30e1\u30ea\u30ab\u5408\u8846\u56fd', u'es': u'Estados Unidos'}}, u'registered_country': {u'geoname_id': 6252001, u'iso_code': u'US', u'names': {u'ru': u'\u0421\u0428\u0410', u'fr': u'\xc9tats-Unis', u'en': u'United States', u'de': u'USA', u'zh-CN': u'\u7f8e\u56fd', u'pt-BR': u'Estados Unidos', u'ja': u'\u30a2\u30e1\u30ea\u30ab\u5408\u8846\u56fd', u'es': u'Estados Unidos'}}, u'subdivisions': [{u'geoname_id': 5332921, u'iso_code': u'CA', u'names': {u'ru': u'\u041a\u0430\u043b\u0438\u0444\u043e\u0440\u043d\u0438\u044f', u'fr': u'Californie', u'en': u'California', u'de': u'Kalifornien', u'zh-CN': u'\u52a0\u5229\u798f\u5c3c\u4e9a\u5dde', u'pt-BR': u'Calif\xf3rnia', u'ja': u'\u30ab\u30ea\u30d5\u30a9\u30eb\u30cb\u30a2\u5dde', u'es': u'California'}}], u'location': {u'latitude': 37.386, u'accuracy_radius': 1000, u'time_zone': u'America/Los_Angeles', u'longitude': -122.0838, u'metro_code': 807}, u'postal': {u'code': u'94035'}, u'continent': {u'geoname_id': 6255149, u'code': u'NA', u'names': {u'ru': u'\u0421\u0435\u0432\u0435\u0440\u043d\u0430\u044f \u0410\u043c\u0435\u0440\u0438\u043a\u0430', u'fr': u'Am\xe9rique du Nord', u'en': u'North America', u'de': u'Nordamerika', u'zh-CN': u'\u5317\u7f8e\u6d32', u'pt-BR': u'Am\xe9rica do Norte', u'ja': u'\u5317\u30a2\u30e1\u30ea\u30ab', u'es': u'Norteam\xe9rica'}}}

OK, not the prettiest output, but you get the idea. Note that if you perform lookups against the MaxMind Country DB, you’ll obviously get significantly less data. Example:

>>> ip_info = maxminddb.open_database(‘GeoLite2-Country.mmdb’)
>>> ip_info.get(‘8.8.8.8’)
{u’country’: {u’geoname_id’: 6252001, u’iso_code’: u’US’, u’names’: {u’ru’: u’\u0421\u0428\u0410', u’fr’: u’\xc9tats-Unis’, u’en’: u’United States’, u’de’: u’USA’, u’zh-CN’: u’\u7f8e\u56fd’, u’pt-BR’: u’Estados Unidos’, u’ja’: u’\u30a2\u30e1\u30ea\u30ab\u5408\u8846\u56fd’, u’es’: u’Estados Unidos’}}, u’continent’: {u’geoname_id’: 6255149, u’code’: u’NA’, u’names’: {u’ru’: u’\u0421\u0435\u0432\u0435\u0440\u043d\u0430\u044f \u0410\u043c\u0435\u0440\u0438\u043a\u0430', u’fr’: u’Am\xe9rique du Nord’, u’en’: u’North America’, u’de’: u’Nordamerika’, u’zh-CN’: u’\u5317\u7f8e\u6d32', u’pt-BR’: u’Am\xe9rica do Norte’, u’ja’: u’\u5317\u30a2\u30e1\u30ea\u30ab’, u’es’: u’Norteam\xe9rica’}}, u’registered_country’: {u’geoname_id’: 6252001, u’iso_code’: u’US’, u’names’: {u’ru’: u’\u0421\u0428\u0410', u’fr’: u’\xc9tats-Unis’, u’en’: u’United States’, u’de’: u’USA’, u’zh-CN’: u’\u7f8e\u56fd’, u’pt-BR’: u’Estados Unidos’, u’ja’: u’\u30a2\u30e1\u30ea\u30ab\u5408\u8846\u56fd’, u’es’: u’Estados Unidos’}}}

Did you notice the beauty of the output? That’s right, we get JSON in return! This allows us to quickly start to work with the output using a structure we’re familiar with. For example:

>>> ip1 = ip_info.get(‘8.8.8.8’)
>>> print ip1['continent']['names']['en']
North America

We can also pretty our output, if need be:

>>> print json.dumps(ip1['continent']['names'],indent=4)
{
"ru": "\u0421\u0435\u0432\u0435\u0440\u043d\u0430\u044f \u0410\u043c\u0435\u0440\u0438\u043a\u0430",
"fr": "Am\u00e9rique du Nord",
"en": "North America",
"de": "Nordamerika",
"zh-CN": "\u5317\u7f8e\u6d32",
"pt-BR": "Am\u00e9rica do Norte",
"ja": "\u5317\u30a2\u30e1\u30ea\u30ab",
"es": "Norteam\u00e9rica"
}

Now that we know how easy it is to perform MaxMindDB lookups, we can quickly put together some Python to iterate over a list and output geolocation information. Based on your wants/needs, you could also utilize the display only the fields you’re interested in. However, knowing that we have a reliable, easy-to-use offline geolocation capability certainly makes the job easier.

A Note on GeoIP Analysis

As a last little tidbit on geolocation analysis, I think there is an important lesson for any analyst who may be utilizing this type of data. Geolocation is not an indication of evil. IP addresses are not reliable indicators, and their geographic origins can sometimes be treated with even less confidence. I could publish this blog post from dozens of countries in the world for next to nothing in cost, if not for free. Don’t raise an alarm because you see a country you don’t do business with, or God forbid, China (this joke will die one day).

That being said, if you are performing analysis on logs or artifacts with IPs, geolocation can be a starting point if it’s all you’ve got. Event logs are a fantastic source where geolocation can help narrow the field. For example, let’s say an analyst is examining RDP event logs and they discover external IP addresses. The analyst cannot make an assumption on evil vs. false-positive, however the analyst could deduce that external RDP connectivity has been previously enabled. This may point to a good starting point.

Thanks for reading, and until tomorrow, happy forensicating!