Regipy: Automating registry forensics with python

Lately I’ve been doing a lot of forensic investigations and tool development that involve registry hives. I tried a lot of tools, and none of them could satisfy all of my requirements:

  • Has JSON output
  • Can be used as a python library with python 3 support and the following features:
  • Recurse over a registry hive from a given path and yield all subkeys and respective values
  • Get a subkey by path
  • Expose some of the binary structures (NK & VK entries for example)
  • Apply transaction logs automatically, and in scale (over hunderds of collected hives and transaction logs)
  • Has a plugin system, with an easy and intuitive way to add more plugins
  • Can run on any operating system
  • Open source

So I decided to create my own tool.

Regipy: an OS independent python library for parsing offline registry hives


Features:

Multiple command line tools:

  • registry-dump — Dump the entire content of the hive to ndjson. if the -t flag is specified, the output file will be a timeline instead
  • registry-parse-header — Parse the REGF header of the file and validate checksum
  • registry-run-plugins — Identify the hive type and run all supported plugins. Output the results as a JSON file.
  • Execute the registry-plugins-list command to get a list of supported plugins, for example:
  • registry-diff — Compare two registry hives (Like regshot tool), and get the output to CSV or to screen:
  • registry-transaction-logs — Recover a registry hive, using transaction logs. The restored hive can be then compared to the original hive to see the differences

Plugins:

It’s really easy to write new plugins, because of all the utils present in regipy. A short example on how to create new plugins can be found here. These are the currently existing plugins:

  • Installed services (from system hive)
  • Amcache (both win7 and win8+ formats)
  • Network routes
  • Computer name
  • Ntuser hive persistence
  • Software hive persistence
  • User Assist
  • Shimcache (Mandiant’s Shimcache parser)
  • Installed software
  • Image file execution options

Use as a library:

  • Get a specific registry key by path, get all subkeys and values for that key
  • Recurse over the registry hive from a given path (or root) and yield all subkeys
  • Execute specific plugins
  • Apply registry transactions from a log file on a specific hive

Hopefully, the DFIR community will find this library/tool useful as well.

I learned a lot about registry structure from working on this project:

  • The registry is a complex binary structure with a lot of extreme cases:
  • Data can be resident inside the VK record itself, if the data is small enough
  • Sometimes the data is small enough and yet it is not resident
  • Data bigger than 16344 bytes will be in a different type of structure (i.e: shimcache)
  • The transaction logs are really important and can make a great difference between having findings in a case or not.

A practical example of analysis using regipy and pandas

Load the hive timeline to group by timeframes and detect interesting registry modifications:

Load the output of registry-dump NTUSER_modified.DAT -t -o ./timeline.csv to pandas:

import pandas as pd
pd.set_option('max_colwidth', 800)
df = pd.read_csv('/tmp/testdata/timeline.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.index = df['timestamp']
df.drop(columns='timestamp', inplace=True)
df = df.sort_index(ascending=True)

We get a dataframe that looks like this:

We want to group events by a specific time span, for example 1 hour:

df_grouped = df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).sum()
df_grouped[df_grouped['values_count'] > 0][df_grouped['values_count'] < 5]

Now we have a small data frame with a couple of interesting timeframes:

Let’s take a look at a specific timeframe:

df[df.index >= '2012-04-04 02:30:00+00:00']

We get the following result (the result is sorted by date, and only the latest entries are shown):

We see that \Software\Microsoft\Windows\CurrentVersion\Run was modified, and also \Software\Microsoft\legitimate_subkey\legitimate_subkey… Interesting :)


What’s next?

There’s a lot of work to be done:

  • Recovering data from unallocated cells
  • Improve performance even more (someone said rust?)
  • Write more plugins — also expecting some pull requests from the community

How to install

Regipy supports only python 3.7

Regipy releases can be found at pypi: https://pypi.org/project/regipy/

to get the latest stable release just use:

pip install regipy

The source code is on github: https://github.com/mkorman90/regipy.