Automating Mapping to ATT&CK: The Threat Report ATT&CK Mapper (TRAM) Tool
TRAM is a web-based tool that automates the extraction of adversary behaviors for the purpose of mapping them to ATT&CK.
Written by Sarah Yoder and Jackie Lasky
It is exciting to see an increasing number of cyber threat intelligence reports published that include ATT&CK mappings and we want to make the creation of these mappings easier and faster for analysts. We know that the process for creating ATT&CK mappings can be challenging, since we go through a similar process for every new report we add to Group or Software pages in ATT&CK. It takes time for an analyst to become familiar with all 266 techniques and to understand the subtleties of how intelligence is mapped to them. For our own mappings, new reports get released daily, which means we have a never ending backlog of reports we’d like to add. In an effort to lessen the workload for future analysts on our team, and to help the rest of the community, we decided to start developing a way to help automate this process.
Our resulting tool, the Threat Report ATT&CK Mapper (TRAM) aims to provide a streamlined approach for analyzing reports and extracting ATT&CK techniques. Our hope is that automating mapping to ATT&CK can reduce analyst fatigue, increase ATT&CK coverage, and improve consistency and accuracy of threat intelligence mappings. We are excited to now share a public beta of TRAM with the ATT&CK community.
How did we get here?
As we started to think about ways to find techniques in reporting, we started with what seemed easiest: fuzzy string searches. We created a command-line tool that looked for techniques via their name. What we quickly learned is that this method either worked really well or not at all. For example, “Mshta” was very high fidelity, but “DLL Search Order Hijacking” was very low. At this point we knew we needed a better way to find techniques, so we decided to apply a more advanced Natural Language Processing (NLP) process to our problem.
The following is our NLP process:
- First, we need to have data to train with, essentially an answer key of right answers for each item we want a model for. In our case, we want a model for each technique, so we use the Procedure Examples from the ATT&CK site for each technique.
- Next, we have to get the data into a “clean” state to be processed. This means turning our text into the simplest version we can for the computer to better understand. For example, masquerade, masquerading, masqueraded, all have the same root meaning, so we want to build our models based on that, and not what tense the word was used in. Similarly, we have to tokenize the text. This means splitting up the text into smaller units, often words, called tokens. These tokens allow the computer to understand patterns in the data, like count the number of words in a sentence or how often two words appear next to each other, and so on.
- Now we can start building a specific model, or pattern, for each technique. We currently use Python’s Sci-kit library to do this. We use a method called Logistic Regression, which is good for making predictions, and to predict what techniques might be in a given sentence. Additionally, our method is considered supervised learning, since we know what our output should be (i.e., a specific technique).
- Before we can use the models on new data, we need to test them on data where we know the right answers. To do this, we tested on reporting we had already mapped to ATT&CK to see if our models would perform well enough to be useful.
- Once we confirm the models can find the data we expected, we can now use our models on data the computer had never seen (e.g., a fresh off the website report).
Luckily, we save these models in a “cached” format using Python’s pickle file. That means we don’t have to go through this whole process every time we want to use the tool! If you’d like to find out more about our NLP process, check out our BSides DC presentation on TRAM from back in October.
How do I use TRAM?
TRAM is a locally-run Web tool that allows users to submit a webpage URL (sorry no PDFs yet). If TRAM is able to retrieve and parse the page, report analysis may take close to a minute since there is a lot happening under the hood.
Note: If a “Needs Review” card pops up right away, this generally means the website did not like us trying to scrape it, or something on the site could not be parsed. We are looking into this issue and hope to have a fix soon.
Once you see a card in the “Needs Review” column, it’s time to start analyzing!
When TRAM’s Logistic Regression model predicts that it has found a technique, it highlights the relevant text and shows the predicted technique in a box to the right. Since our current data set is very limited, our models are not 100% accurate, so the tool requires an analyst to review and “Accept” or “Reject” the technique prediction. Behind the scenes, when the “Accept” button is clicked, that sentence and technique will go to the “True Positives” table in the database; when “Reject” is clicked, the sentence goes to the “False Positives” table. We can then use these tables to rebuild the models. As more data is fed to the tool, analyst reviewed, and the model rebuilt we expect these predictions to get more accurate.
Whether a sentence is highlighted or not, if a technique needs to be added manually, users can do this by clicking the sentence and clicking on the grey “Add Missing Technique” button in the box that appears. Start typing in the ATT&CK technique you want to map, and click it when it appears. If the sentence was not previously highlighted, highlighting will now appear. Additionally, like the accepted and rejected techniques, when a missing technique is added, it will be put “True Negatives” table, which is accounted for as well when rebuilding the models.
Once an analyst has reviewed the entire report, TRAMS’s results can now be exported as a PDF by clicking the “Export PDF” button on the top center of the page. Exporting will create a PDF containing a raw text version of the report, and a table with the ATT&CK technique and its corresponding sentence. An example of a partial table is shown below.
What are the next steps for TRAM?
We are happy to share the tool as it stands today to begin helping those mapping to ATT&CK. However, we know there is a lot more that can be done. TRAM is currently a functional prototype that is continuously being improved and developed. We have several features we would like to implement over the next several months. As these features are added, we’ll continue to announce any new changes and keep our public repository up to date.
Some of our next steps include:
- Ability to ingest additional file types (e.g., .doc, .pdf, .txt).
- Additional output formats (e.g., CSV, JSON, STIX)
- Ability to support multiple users simultaneously
- Dashboard & Analytics (e.g., top 10 techniques seen from reporting, technique frequency over time, etc.)
How do I get TRAM?
The full source code for TRAM can be found at https://github.com/mitre-attack/tram. The README will help walk through getting the tool running.
Feel free to download it and start testing it out. Since this is a beta, we know it has bugs and issues. Please help us keep track of these by using the GitHub issue tracker.
How can I contribute to TRAM?
In line with one of the core tenets of the ATT&CK framework, we want TRAM to be community driven. While it was developed with the MITRE ATT&CK team’s use case in mind, we know others will have different needs. We encourage pull requests and hope if you TRAM in some cool way you’ll share it back to the larger community.
©2019 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 19–01075–17.