Regipy: Automating registry forensics with python
Lately I’ve been doing a lot of forensic investigations and tool development that involve registry hives. I tried a lot of tools, and none of them could satisfy all of my requirements:
- Has JSON output
- Can be used as a python library with python 3 support and the following features:
- Recurse over a registry hive from a given path and yield all subkeys and respective values
- Get a subkey by path
- Expose some of the binary structures (NK & VK entries for example)
- Apply transaction logs automatically, and in scale (over hunderds of collected hives and transaction logs)
- Has a plugin system, with an easy and intuitive way to add more plugins
- Can run on any operating system
- Open source
So I decided to create my own tool.
Regipy: an OS independent python library for parsing offline registry hives
Features:
Multiple command line tools:
- registry-dump — Dump the entire content of the hive to ndjson. if the -t flag is specified, the output file will be a timeline instead
- registry-parse-header — Parse the REGF header of the file and validate checksum
- registry-run-plugins — Identify the hive type and run all supported plugins. Output the results as a JSON file.
- Execute the registry-plugins-list command to get a list of supported plugins, for example:
- registry-diff — Compare two registry hives (Like regshot tool), and get the output to CSV or to screen:
- registry-transaction-logs — Recover a registry hive, using transaction logs. The restored hive can be then compared to the original hive to see the differences
Plugins:
It’s really easy to write new plugins, because of all the utils present in regipy. A short example on how to create new plugins can be found here. These are the currently existing plugins:
- Installed services (from system hive)
- Amcache (both win7 and win8+ formats)
- Network routes
- Computer name
- Ntuser hive persistence
- Software hive persistence
- User Assist
- Shimcache (Mandiant’s Shimcache parser)
- Installed software
- Image file execution options
Use as a library:
- Get a specific registry key by path, get all subkeys and values for that key
- Recurse over the registry hive from a given path (or root) and yield all subkeys
- Execute specific plugins
- Apply registry transactions from a log file on a specific hive
Hopefully, the DFIR community will find this library/tool useful as well.
I learned a lot about registry structure from working on this project:
- The registry is a complex binary structure with a lot of extreme cases:
- Data can be resident inside the VK record itself, if the data is small enough
- Sometimes the data is small enough and yet it is not resident
- Data bigger than 16344 bytes will be in a different type of structure (i.e: shimcache)
- The transaction logs are really important and can make a great difference between having findings in a case or not.
A practical example of analysis using regipy and pandas
Load the hive timeline to group by timeframes and detect interesting registry modifications:
Load the output of registry-dump NTUSER_modified.DAT -t -o ./timeline.csv to pandas:
import pandas as pdpd.set_option('max_colwidth', 800)df = pd.read_csv('/tmp/testdata/timeline.csv')df['timestamp'] = pd.to_datetime(df['timestamp'])df.index = df['timestamp']df.drop(columns='timestamp', inplace=True)df = df.sort_index(ascending=True)
We get a dataframe that looks like this:
We want to group events by a specific time span, for example 1 hour:
df_grouped = df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).sum()df_grouped[df_grouped['values_count'] > 0][df_grouped['values_count'] < 5]
Now we have a small data frame with a couple of interesting timeframes:
Let’s take a look at a specific timeframe:
df[df.index >= '2012-04-04 02:30:00+00:00']
We get the following result (the result is sorted by date, and only the latest entries are shown):
We see that \Software\Microsoft\Windows\CurrentVersion\Run was modified, and also \Software\Microsoft\legitimate_subkey\legitimate_subkey… Interesting :)
What’s next?
There’s a lot of work to be done:
- Recovering data from unallocated cells
- Improve performance even more (someone said rust?)
- Write more plugins — also expecting some pull requests from the community
How to install
Regipy supports only python 3.7
Regipy releases can be found at pypi: https://pypi.org/project/regipy/
to get the latest stable release just use:
pip install regipy
The source code is on github: https://github.com/mkorman90/regipy.