Hunting for PowerShell Using Heatmaps

Josh Liburdi
9 min readJan 5, 2017

--

This post is an example of how visualizations can be used in threat hunting; it isn’t an explicit play-by-play guide for finding malicious PowerShell activity in your network, but is written to show how easy it can be to create and use visualizations as a hunting aid.

Warning: Python code ahead! Python is my preferred method for creating visualizations (thanks Plotly, Seaborn, matplotlib) and snippets of code will be shown throughout this post. Before that scares you away, consider that the level of experience required to do the tasks described below are closer to ‘Python hacker’ than ‘Python developer’—the code shown here fits within 20 lines, including module imports.

Why hunt for PowerShell?

Why not?

Kidding aside, the reason is that attackers commonly use PowerShell when they gain access to target networks. From a threat intelligence point of view this nugget of information is as basic as it gets, but it’s still enough to guide our efforts: attackers commonly use PowerShell, so we should look at PowerShell activity in our network. Would more information be useful? Absolutely! If we had it, then we could use it to create detailed hypotheses that describe multiple ways attackers may use PowerShell, but for the sake of example, let’s pretend we’re going into this nearly blind. (By the way, there’s no shortage of freely available resources that describe how attackers use it—if you’re interested, then load up your favorite authors/vendors/projects and search away.)

There’s also some inherent difficulty to hunting for malicious use of administrative tools (PowerShell, WMI, etc.)—from a log perspective, attackers tend to blend in with administrators. This creates a challenge: can we hunt for PowerShell use and accurately separate administrator activity from attacker activity? Challenges like this are fun because they typically force you to rethink your approach and iterate through various techniques until you discover one that works.

Why use heatmaps to hunt for PowerShell?

Why not?

This time I’m not kidding. The work described below originally started with an urge to find an interesting use for heatmaps as a threat hunting aid. Under normal circumstances, a hunting use case should dictate which technique you use; however, taking the reverse approach was personally useful in the context of learning what works and what doesn’t work when using heatmaps.

If you’ve never seen one before, a heatmap is a matrix of values that describe relationships (via color) between two value sets. If that sounds complex, then just imagine an Excel document that has a header row and a header column with some color splashed in to describe the data on the spreadsheet — that’s a heatmap.

Examples of heatmaps created in Microsoft Excel. Credit, clockwise: http://policeanalyst.com, http://tomhinkle.com/, http://wikipedia.org.

When would one use a heatmap? They’re useful when you want to see relationships or trends in a dataset and can be used to identify high or low outliers.

Bring on the data!

The data described in this post was collected from a network that had a targeted attack carried out against it, so it contains both normal and malicious PowerShell artifacts. The data itself is process execution metadata—including fields that describe the host each process was executed on, the user account that executed the process, and the command line arguments used by the process—that originate from PowerShell processes. This is an important point to consider: we are not looking at every process from the network, we have a filtered dataset that is isolated to PowerShell processes; appropriately filtering the dataset ensures that the data in our graphs won’t be misinterpreted.

To get the PowerShell process metadata into a usable state for generating a heatmap, we have to take it from the original source (in this case, a CSV file) and turn it into a Pandas DataFrame (DF). Pandas was built for data manipulation and analysis—as someone who previously managed data in Python scripts via lists and dictionaries, I find DFs much easier to work with. Let’s take a look at how easy it is to store the source CSV file as a DF and do some basic inspection.

>>> import pandas as pd
>>> powershell_pdf = pd.read_csv('powershell.csv')
>>> len(powershell_pdf.index)
12796

Pretty simple, huh? Pandas takes the CSV header and assigns it to columns in the DF; each subsequent row in the CSV is stored as an index which contains the CSV field values. Our DF has 12796 indexes, which corresponds to how many events were in the CSV file. This means that we’re dealing with ~13k unique PowerShell execution events.

That seems like a lot, but we can use other Pandas functions to get a sense of scale for these events, such as determining how many hosts are contained in the dataset.

>>> powershell_pdf['computer_name'].nunique()
14

Now we know that we’re dealing with ~13k unique PowerShell events across 14 unique hosts …

>>> powershell_pdf['computer_name'].unique()
array(['SRV-DC', 'USR-07', 'SRV-EMAIL', 'SRV-FILE', 'USR-01', 'USR-03','SRV-SQL', 'USR-09', 'USR-00', 'USR-02', 'USR-06', 'USR-08', 'USR-05', 'SRV-SHRPT'], dtype=object)

… and that the hosts are either USR systems (workstations) or SRV systems (servers).

In a real world scenario, we would want to separate this data into groups: separate workstations from servers and potentially sub-group servers based on class (domain controller, file server, etc).

Ideally, data loaded onto a heatmap should be as alike (i.e., homogeneous) as possible. Our dataset is diverse (i.e., heterogeneous) and this could create poor results—using a heterogeneous dataset may skew the graph if the elements in it are very unalike. (In this context, comparing the traits of workstations and servers could be similar to comparing the traits of apples and oranges; we have to be sure that that’s what we really want to do and, if not, consider if it’s better to compare the two groups separately.)

Given how small our dataset is, we’ll continue with what we have so far.

Show me the heatmap(s)!

Now that we have data loaded in the proper format and understand that we’re working with a suboptimal dataset (once again, the mixture of workstations and servers could create problems for us later), we’re almost ready to build a heatmap. However, we’re missing a critical element: we need to determine which columns we will use as the X axis and the Y axis. The columns we choose become intersections on the heatmap — this means that if X and Y are related, then we’ll see that as a result on the graph. Here are the columns we can use:

>>> list(powershell_pdf.columns.values)
['process_guid', 'computer_name', 'username', 'command_line', 'timestamp']

Our stated goal was to track PowerShell activity in the network, so it’s reasonable to focus on either the computer_name (i.e., the host where PowerShell was executed) or the username (i.e., the user account that ran PowerShell) for one axis—this would allow us to see PowerShell execution by host or by account.

Choosing the column for the other axis is where we get to experiment, and the choice depends entirely on what kinds of results we think would be useful for separating malicious use (attackers) from normal use (administrators). A simple method for figuring out what might be useful to display is to say it in a common language, then translate it to code: ‘I want to see which user accounts (X) executed PowerShell on hosts (Y)’, ‘I want to see the time of day (X) when PowerShell was executed on hosts (Y)’, ‘I want to see which PowerShell scripts (X) were executed on hosts (Y).’

This approach is applicable to any dataset: one axis represents the focus of your attention and the other axis represents a trait that you would like to see relationships/trends for.

To start, we’ll look at PowerShell execution across both computer_name and username; color on the graph (and subsequent graphs) represents the sum of executions per X and Y. Now we just need to reorganize our data and create a heatmap — Pandas, Seaborn, and matplotlib allow us to produce a graph with a few lines of code. (You might notice that the graphs below have their X and Y axes flipped from what is described above—it’s more legible this way.)

>>> import seaborn as sns 
>>> import matplotlib.pyplot as plt
>>> powershell_pdf_hm = pd.DataFrame({'count': powershell_pdf.groupby(['username','computer_name']).size()}).reset_index()
>>> powershell_pdf_hm = powershell_pdf_hm.pivot(index='computer_name', columns='username', values='count').fillna(0)
>>> hm1 = sns.heatmap(powershell_pdf_hm.T)
>>> plt.show()
A heatmap that describes the number of PowerShell executions by 4 accounts across 14 hosts.

We have a heatmap, but there’s a problem (and if you carefully considered the warnings I shared earlier, then you’ll immediately understand why): a SYSTEM account executed PowerShell ~8k times on host SRV-DC, which is far greater than any other username/computer_name pair. This is the homogeneous/heterogeneous issue I mentioned earlier making itself known. It’s reasonable to think that the number of PowerShell executions on workstations should have a similar range (which, according to the graph, they appear to); it isn’t reasonable to think that the number of PowerShell executions across all hosts will have a similar range.

Even with a dataset this small, we should have separated these hosts into groups. If you’re going to use heatmaps as a hunting aid, then you have to be aware of how outliers can skew the graph and make it unusable—separating the data into groups is one method of potentially dodging this problem.

Since that particular username/computer_name pair is an extreme outlier, we’ll remove SRV-DC from the dataset so that we can get a better sense of the activity.

A heatmap that describes the number of PowerShell executions by 4 accounts across 13 hosts.

The range on this heatmap is better, but it still illustrates the same point: a SYSTEM account executes PowerShell on nearly all of the hosts in the dataset while other accounts rarely execute it. If a SYSTEM account executed PowerShell ~8k times on SRV-DC, then that means there are ~4k unique PowerShell execution events represented on this heatmap—it’s cramming a lot of data into a useful, easily interpretable image.

From this point, we can quickly iterate the original DF to multiple heatmaps using only a few lines of code. Based on our analysis, we can process data in the DF, add new columns to it, and generate heatmaps guided by our findings.

Here are a few more heatmaps for you to look at. Can you spot the malicious activity?

A heatmap that describes the number of PowerShell executions by the UTC hour of execution time across 13 hosts.
A heatmap that describes the number of PowerShell script executions by the UTC hour of execution time across 5 hosts. The exclusion of a host from the graph means that no PowerShell scripts were executed on that host.
A heatmap that describes the number of PowerShell script executions involving 7 scripts across 5 hosts.

By now I think that the value of using visualizations (like heatmaps) as a hunting aid is clear: when used thoughtfully, they concisely summarize huge amounts of information into a graph that is (fingers crossed) easy to interpret. The real power here isn’t in one’s ability to interpret a heatmap, but in the ability to research, create, and iterate one—if anything, this is as good a reminder as any that if you work in the incident detection/response or alert investigation space, then you should have some scripting experience. That said, the technical barrier of entry for doing this type of hunting is low—it only requires access to source log records and a basic level of Python knowledge.

The last thing to consider is that a heatmap doesn’t have to be limited to hunting for PowerShell activity or even process execution activity; this technique could be applied to any dataset that you want to see relationships in.

PS: Yes, the attacker ran malicious instances of PowerShell on workstation USR-07 between 18:00-18:59 UTC via a unique script.

--

--