Find traces of bad guys in evidence

3 min readDec 5, 2016

tldr: docker run -v ./reports:/a -v ./all-evidence:/b -v ./output:/c -it pcbje/gbl

You have been called to figure out of a situation a company is having. Someone has gained access to the company’s financial system and now all assets have been moved to a tax haven. We have less than six hours to find the one responsible or the money is lost. They have acquired the laptop that were used to transfer the funds, but they have not been able to determine who used it. We also have access to backups of all employee computers.

Our task is to identify the user of the laptop. But how?

A place to start is to extract all identifiable information (names, email addresses, etc.) from the acquired laptop and search for these in all other evidence.

By doing this, if an employee has been sending emails using fake information, but have been using same information on both the target laptop and other devices, we may be able to connect the dots.

Please excuse the lack of gender variation. I’ve yet to figure out how to draw female characters.

In the scenario above, we may want to take a closer look at Phil, as a suspicious name occurs on both his own laptop and on the laptop responsible for the money transfers.

From a technical perspective

The approach I suggest consist of two steps:

1: Extract all entities (names, emails, etc.) from the documents using Gransk.

2: Search for all extracted entities in all the other data using Bulk_extractor and Lightgrep.

Gransk is a document processing tool I’ve developed that, among other things, extracts entities from documents using the Named Entity Recognition library Polyglot.

Bulk_extractor is a data scanner capable of processing data like archives and PDF-documents. Lightgrep is a multi-pattern matching library, so we can search for many entities approximately as fast as searching for a single entity.

These libraries and steps may be installed and performed manually, but if you’re able to use Docker, I’ve made it easy for you. Just modify and execute the following command:

$ docker run -v ./reports:/a -v ./all-evidence:/b -v ./output:/c -it pcbje/gbl

Take an example from the M57 patents scenario. Here we check the contents of four detective reports against a memory image:

Command and output from combining Gransk, bulk_extractor and lightgrep

The result is a text file in ./output/lightgrep.txt that looks like this:

From these results we see that the name “Terry Johnson” occurs in at least one detective report and in the given memory image.

Concluding remarks

My use case is to correlate a large set of entities against huge documents, and I do think this has the potential to save analysts and investigators a lot of time and effort. However, I would love to hear from you if you think this may have other use cases as well!

Make sure to check out Gransk on Github!

Find traces of bad guys in evidence

From a technical perspective

Concluding remarks

Written by Petter Chr. Bjelland