Our TRAM Large Language Model Automates TTP Identification in CTI Reports

Published in

MITRE-Engenuity

8 min readAug 29, 2023

Written by James Ross & jackie lasky.

The cybersecurity community has been working for years to solve the problem of automatically identifying adversary tactics, techniques, and procedures (TTPs) in cyber threat intelligence (CTI) reports. With some advances in machine learning and artificial intelligence, we offer a solution that is measurably effective at solving that problem.

Mapping TTPs found in CTI reports to MITRE ATT&CK® is difficult, error prone, and time-consuming. The Threat Report ATT&CK Mapper (TRAM) was designed to help. Our early work focused on creating a data annotation tool and using supervised learning methods to extract and predict TTPs. This latest update to TRAM improves the quality of the training data and makes effective use of fine-tuned Large Language Models (LLMs) for model training and predictions.

In partnership with Center Participants including CrowdStrike, Inc., HCA — Information Technology & Services, Inc., Lloyds Banking Group plc, and JPMorgan Chase Bank, N.A., we identified a core set of use cases to drive this round of TRAM development and focused on improvements to text classification to achieve three goals:

Streamline the generation of customized training sets through data annotation,
Provide high quality training data, and
Bundle advantages of LLMs into TRAM, specifically
a. Models come pre-trained
b. They can be tailored for different use cases (e.g., generating text, predicting text),
c. They can predict text not included in the training data.

The research team collaborated with Participants to build products in each of these areas:

Data annotation: Recommended annotation tool features and Data Annotation Best Practices Guide on our GitHub Wiki — build model training data like a pro!
High-quality model training data: Annotated 150 reports containing 4,070 technique-labeled sentences out of 19,011 total samples, completed by ATT&CK experts.
TRAM tool updates: Including a new prediction model based on SciBERT — also available as a Jupyter notebook.

We are excited to share our new areas of work and ways you might best leverage our annotated data and model development for your own industry goals.

Then vs. Now: How has our approach changed this time around and how do the results look?

The initial versions of TRAM attempted to predict all the ATT&CK techniques available in the training data based on text from CTI reports. This broad approach relies on having:

An example of each ATT&CK technique, and
Enough examples of each to tell the difference between them.

We learned that a narrower approach could help achieve better results in predicting adversary TTPs. This led us to select 50 ATT&CK techniques for this effort.

Previous versions of TRAM did not consider LLMs and solely relied on supervised learning classification methods such as logistic regression for prediction. In March 2023, Senior Cybersecurity Engineer Jackie Lasky wrote in the Center’s blog post, Next Stop For TRAM, about the use of LLMs in TRAM to reduce the time and effort required to integrate new intelligence into cyber operations. We explored the advantages of incorporating LLMs and how they could improve automated identification of adversary TTPs in threat intelligence reports. We experimented with machine learning architectures such as BERT and GPT-2 to adapt for text classification and fine-tuning for cyber threat intelligence and selected SciBERT.

This shift to an LLM meant that our previous training data would no longer work. The previous data set simply didn’t match how LLMs use data for training and development. So, we began by ensuring that our annotation approach met the basic requirements for LLMs and that the annotations we created would be reusable.

We focused on annotating adversary TTPs in threat intelligence reports to use as model training data and fine-tuning LLMs. By being strategic in our approach for annotating data, we have enabled TRAM to apply and further refine LLMs to make automating predictions more accessible and future-ready.

LLMs benefit from being pre-trained on vast amounts of data. Because these models arrive pre-built, they require a much smaller amount of domain-specific training data to perform other tasks. These models can later be fine-tuned for specific use cases, like classifying adversary TTPs in CTI reports. A main benefit of using LLMs is the ability to predict text not included in the training data, as compared to our previous models that were limited to finding familiar words and phrases. We integrated LLM code into the TRAM web tool along with providing Jupyter notebooks for machine learning (ML)-focused users.

To confirm our assumption that LLMs could have better performance than logistic regression we needed to analyze and compare results. Precision, Recall, and F1 are metrics commonly used to compare the effectiveness of models:

Precision (P) measures the ratio of true positive results (correct predictions) against true and false positive results (correct and incorrect predictions) in a sample set.
Recall (R) measures the ratio of true positive results (correct predictions) against true positive (correct predictions) and false negative (incorrect exclusions) in a sample set
F1-score (F1) measures the balance between precision and recall scores.

Figure 1: Precision, Recall, and F1-score comparison between SciBERT and Logistic Regression models

The fine-tuned SciBERT model shows improvement over the logistic regression model in all but one area where we measured Precision, Recall, and F1-score. For TRAM users this means our new LLM identified the correct ATT&CK technique 88 of 100 times and missed finding 12 techniques out of 100 samples.

How can I get started?

If this is your first time — head to over to the Center for Threat-Informed Defense/TRAM GitHub repository and follow the instructions in the README to install TRAM.

Figure 2: Center-for-threat-informed-defense/TRAM GitHub Repository

If you’ve used TRAM before, you’re familiar with launching into the webUI and uploading a JSON, .docx, .pdf, or even .txt report for automatic analysis. The Log Regression and SciBERT models process the report in the background. When the review button is lit, you are presented with a list of sentences to confirm the associated ATT&CK ID or add a few of your own. The export function includes the full report with identified techniques in popular formats like JSON, .pdf, or .docx.

TRAM offers dual functionality, integrating the fine-tuned SciBERT LLM to predict 50 of the most common ATT&CK techniques while also empowering analysts to add their own identified techniques to reports. For those interested in integrating the model results into an existing workflow, we packaged the same LLM functionality into a Jupyter notebook.

The TRAM tools meet our goal of predicting which ATT&CK TTPs are present in a threat intelligence report. By design, TRAM can be extended to ingest a larger, more comprehensive data set, or even a customized collection to identify organization-specific indicators of interest.

How do I use TRAM to train models with my own data?

Jupyter notebooks offer users a step-by-step process to execute the code behind the analysis. Whether run locally or hosted online via Google Colab you can import your own data and utilize Google’s GPU-enabled system to access to our LLM training code. Inside the notebooks you’ll find guided tips for running each cell and how to run and interpret the results. Learn more on our GitHub repository!

Figure 6: TRAM SciBERT model Jupyter Notebook on Google Colab

Data in Disarray: Embracing the right format for LLM training

We refined how we annotated data and optimized where we collected it to ensure that the reports we analyze are relevant and contain TTPs of interest. We also collaborated with Feedly for Threat Intelligence, a tool for collecting and sharing targeted open-source intelligence, to use their enterprise Feedly AI engine to intelligently collect data for our annotation effort. Feedly for Threat Intelligence facilitates filtering the type of articles you want to view and then use their API to easily download the data in JSON format, including the full text of the report and metadata.

Figure 8: In this example, we are using Feedly AI to filter on articles that are considered “threat intelligence reports” and contain information related to “new malware” and “threat actors” while excluding “mobile” threat intelligence reports.

Figure 9: In this image, we are using Feedly AI to try and search for threat intelligence reports that may contain the ATT&CK technique T1110.004 to help find more positive examples for that technique.

We added 150 reports with 4,070 technique-labeled sentences out of 19,011 total samples to the previous training data set. All training data is available for you to work with on our GitHub repository in several convenient formats.

Visit our GitHub wiki to learn more about our annotation process and how you can get started with our annotated data set.

Final Thoughts: Are we there yet?

The overall goal has always been to automate the mapping of CTI reports to ATT&CK. TRAM began by offering analysts a way to view and modify results from automatic predictions in a web browser. Our latest release benefits from incorporating LLMs, achieving better performance and providing a means to continue improving through the collection of data in a consistent and structured way.

Today, our LLM classifier in TRAM accurately predicts 50 of the over 600 ATT&CK (sub-)techniques — we still need more data to provide a more comprehensive solution. We now know how that data needs to be properly annotated to increase the likelihood of model success. With the addition of more data, we have never been closer to our goal of being able to identify ATT&CK TTPs accurately and efficiently in CTI reports.

Have data you want to contribute? Want to get involved?

Here are a few ways you can get involved with TRAM and help advance threat-informed defense:

Use TRAM and offer feedback. GitHub issues with questions, bug reports, and feature requests are appreciated.
Contribute training data. We encourage you to map CTI reports to ATT&CK and contribute to TRAM’s training data set. See our guidance for contributing training data.
Contribute to open-source TRAM. We welcome GitHub pull requests. See our contributor information for guidance.
Share ideas and ML research. Effectively applying ML to identify ATT&CK techniques in CTI reports is a research problem. Help us advance research into this ML application by sharing your own ideas and research.

If you have any form of annotated ATT&CK data that you think might be useful for TRAM, please reach out to ctid@mitre-engenuity.org! Stay tuned and follow the Center to learn about more projects!

About the Center for Threat-Informed Defense

The Center is a non-profit, privately funded research and development organization operated by MITRE Engenuity. The Center’s mission is to advance the state of the art and the state of the practice in threat-informed defense globally. Comprised of participant organizations from around the globe with highly sophisticated security teams, the Center builds on MITRE ATT&CK®, an important foundation for threat-informed defense used by security teams and vendors in their enterprise security operations. Because the Center operates for the public good, outputs of its research and development are available publicly and for the benefit of all.