MITRE ATT&CK via Jupyter Notebook for Beginners

Aaron May
4 min readApr 8, 2022

--

In this tutorial we’ll download the MITRE ATT&CK data set as an Excel file and perform basic Data Cleaning tasks and Exploratory Data Analysis (EDA). The end result will be a bar chart providing a count of MITRE Sub-Techniques by Data Component.

Step 1: Import Python Modules

In this tutorial we’ll use Pandas, PyJanitor and Plotly modules.

import pandas as pd # for data acquisition and manipulation
import janitor as jn # for data cleaning tasks
import plotly.express as px # for visualization
import plotly.io as pio # for visualization

Step 2: Define Settings

When performing initial analysis of a data set, it’s often helpful to remove the Pandas display restrictions for columns and rows.

And we’ll have Plotly send visualization outputs to the browser.

# Pandas
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# Plotly
pio.renderers.default = "browser"

Step 3: Download the MITRE Enterprise ATT&CK Data Set

There are various ways to access the MITRE data sets. In this tutorial we download the data from an Excel file and convert each sheet to a Pandas data frame.

url_attack = 'https://attack.mitre.org/docs/enterprise-attack-v10.1/enterprise-attack-v10.1.xlsx'
df_datasources = pd.read_excel(url_attack, sheet_name='datasources')
df_tactics = pd.read_excel(url_attack, sheet_name='tactics')
df_techniques = pd.read_excel(url_attack, sheet_name='techniques')
df_relationships = pd.read_excel(url_attack, sheet_name='relationships')
df_mitigations = pd.read_excel(url_attack, sheet_name='mitigations')
df_software = pd.read_excel(url_attack, sheet_name='software')
df_groups = pd.read_excel(url_attack, sheet_name='groups')

Step 4: Clean-up Column Names

Running the command df_techniques.head() in our notebook, we can see that there are spaces in the column names. We can correct the column names using the PyJanitor function jn.clean_names().

df_datasources = jn.clean_names(df_datasources)
df_tactics = jn.clean_names(df_tactics)
df_techniques = jn.clean_names(df_techniques)
df_relationships = jn.clean_names(df_relationships)
df_mitigations = jn.clean_names(df_mitigations)
df_software = jn.clean_names(df_software)
df_groups = jn.clean_names(df_groups)

Step 5: Sampling the Data.

Let’s take a look at the data. In the screenshots below we can see that a lot of useful information is available. Having direct access to MITRE ATT&CK data as a table, provides options for filtering and customization based on our threat detection research and development needs.

Techniques:

Data Sources:

Relationships:

Step 6: Perform Data Pre-Processing Tasks

When observing the techniques data, we can see that in the data sources column the data sources are combined into a string (screenshot below) which isn’t suitable for out desired outcome of visualizing sub-technique counts by data sources. In order to use the data source information for a visualization, we’ll need to convert the string to a list of data sources and then use the Pandas explode function to generate separate observations.

# Convert string of data sources to a list of data sourcesdf_techniques['data_sources'] = df_techniques['data_sources'].str.split(",")# Use Pandas explode function to expand the list of data sources to separate rowsdf_techniques = df_techniques.explode('data_sources').reset_index(drop=True)# Get the technique ID and data sources then drop duplicate rows; place the output in a new data frame called 'viz_data' that'll be used for our visualization.viz_data = df_techniques[['data_sources','id']].drop_duplicates().groupby(['data_sources']).size().reset_index()# Rename column to 'count' in preparation for visualizationviz_data.columns = viz_data.columns.map(str)
viz_data = viz_data.rename(columns={"0": "count"})

Before:

After:

Step 7: Visualize the Data

In our final step, we use Plotly to visualize the data. As shown in the diagram, a significant portion of MITRE techniques are related to the data sources “Command: Command Execution” and Process: Process Creation.” In a future post, we’ll analyze the OSSEM data set to better understand the relationship between techniques, data sources, and event IDs.

fig_te_by_ds  = px.bar(viz_data.sort_values('count', ascending=False).head(50), x='data_sources', y='count', title='MITRE ATT&CK: Sub-Technique Count by Data Source (Top 25)', labels={'count':'Technique Count', 'data_sources':'Data Source'})
fig_te_by_ds.show()

Hope you found this post helpful.

Resources:

--

--