Regipy: Automating registry forensics with python

martin korman
Mar 6 · 4 min read

Lately I’ve been doing a lot of forensic investigations and tool development that involve registry hives. I tried a lot of tools, and none of them could satisfy all of my requirements:

  • Has JSON output
  • Can be used as a python library with python 3 support and the following features:
  • Recurse over a registry hive from a given path and yield all subkeys and respective values
  • Get a subkey by path
  • Expose some of the binary structures (NK & VK entries for example)
  • Apply transaction logs automatically, and in scale (over hunderds of collected hives and transaction logs)
  • Has a plugin system, with an easy and intuitive way to add more plugins
  • Can run on any operating system
  • Open source

So I decided to create my own tool.

Regipy: an OS independent python library for parsing offline registry hives


Features:

Multiple command line tools:

  • registry-dump — Dump the entire content of the hive to ndjson. if the -t flag is specified, the output file will be a timeline instead
  • registry-parse-header — Parse the REGF header of the file and validate checksum
  • registry-run-plugins — Identify the hive type and run all supported plugins. Output the results as a JSON file.
  • Execute the registry-plugins-list command to get a list of supported plugins, for example:
  • registry-diff — Compare two registry hives (Like regshot tool), and get the output to CSV or to screen:
  • registry-transaction-logs — Recover a registry hive, using transaction logs. The restored hive can be then compared to the original hive to see the differences

Plugins:

It’s really easy to write new plugins, because of all the utils present in regipy. A short example on how to create new plugins can be found here. These are the currently existing plugins:

  • Installed services (from system hive)
  • Amcache (both win7 and win8+ formats)
  • Network routes
  • Computer name
  • Ntuser hive persistence
  • Software hive persistence
  • User Assist
  • Shimcache (Mandiant’s Shimcache parser)
  • Installed software
  • Image file execution options

Use as a library:

  • Get a specific registry key by path, get all subkeys and values for that key
  • Recurse over the registry hive from a given path (or root) and yield all subkeys
  • Execute specific plugins
  • Apply registry transactions from a log file on a specific hive

Hopefully, the DFIR community will find this library/tool useful as well.

I learned a lot about registry structure from working on this project:

  • The registry is a complex binary structure with a lot of extreme cases:
  • Data can be resident inside the VK record itself, if the data is small enough
  • Sometimes the data is small enough and yet it is not resident
  • Data bigger than 16344 bytes will be in a different type of structure (i.e: shimcache)
  • The transaction logs are really important and can make a great difference between having findings in a case or not.

A practical example of analysis using regipy and pandas

Load the hive timeline to group by timeframes and detect interesting registry modifications:

Load the output of registry-dump NTUSER_modified.DAT -t -o ./timeline.csv to pandas:

import pandas as pdpd.set_option('max_colwidth', 800)df = pd.read_csv('/tmp/testdata/timeline.csv')df['timestamp'] = pd.to_datetime(df['timestamp'])df.index = df['timestamp']df.drop(columns='timestamp', inplace=True)df = df.sort_index(ascending=True)

We get a dataframe that looks like this:

We want to group events by a specific time span, for example 1 hour:

df_grouped = df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).sum()df_grouped[df_grouped['values_count'] > 0][df_grouped['values_count'] < 5]

Now we have a small data frame with a couple of interesting timeframes:

Let’s take a look at a specific timeframe:

df[df.index >= '2012-04-04 02:30:00+00:00']

We get the following result (the result is sorted by date, and only the latest entries are shown):

We see that \Software\Microsoft\Windows\CurrentVersion\Run was modified, and also \Software\Microsoft\legitimate_subkey\legitimate_subkey… Interesting :)


What’s next?

There’s a lot of work to be done:

  • Recovering data from unallocated cells
  • Improve performance even more (someone said rust?)
  • Write more plugins — also expecting some pull requests from the community

How to install

Regipy supports only python 3.7

Regipy releases can be found at pypi: https://pypi.org/project/regipy/

to get the latest stable release just use:

pip install regipy

The source code is on github: https://github.com/mkorman90/regipy.

DFIR Dudes

Yet another DFIR blog, authored by Martin Korman & Hadar Yudovich

martin korman

Written by

Malware Analyst and Forensic Investigator. Tweets represent my own opinion.

DFIR Dudes

Yet another DFIR blog, authored by Martin Korman & Hadar Yudovich

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade