Community Evaluating Free Telemetry 💸 🌎 Following the ATT&CK Evals Methodology ⚔️
In late 2019, the ATT&CK Evaluations team evaluated 21 endpoint security vendors to provide insight and transparency over their true capabilities to detect adversary behavior mapped to ATT&CK. The methodology used was based on APT29 techniques for which several organizations shared open source intelligence to help out with the development of the emulation plan.
On April 21st, 2020, the ATT&CK evals team released the results of that evaluation, the emulation plan, all payloads used for Day 1 and Day 2 , and a Do-It-Yourself Caldera plugin. They are now great resource to not only emulate an adversary, but to also learn several ways to detect its behavior based on the detection context provided by every single endpoint vendor.
On the same day (April 21st, 2020) 😆 I decided to organize a detection hackathon leveraging the resources that were released to bring other researchers around the world together and help develop detection rules over free telemetry that I generated after running the official emulation plan.
One thing that I started to realize after the first round of initial contributions (2020/05/02) was that I was technically running a similar “ATT&CK Evaluation” process, but over free telemetry 🤔. What? How come?
In this post, I will show you some of the work being done with the community, current detection contributions and the beginnings of a free telemetry evaluation process. In addition, I will show you how I was able to automatically create a Jupyter Notebook to interactively run queries mapped to each emulation step. Also, all the information in this post will only be related to data from the APT29 Day 1 since that’s what we have covered so far. I hope this post inspires and motivates others to join us for Day 2 😉 🍻
- Day 2 Registration Form (Time TBD): https://bit.ly/APT29EventDay2
ATT&CK Evaluations? Tell me more.. (Simplified)
First, I believe it is important to get familiarized with some of the mains steps behind an ATT&CK evaluation. I highly recommend to read ATT&CK Evaluations: Understanding the Newly Released APT29 Results by Frank Duff to have a better understanding of the evaluation process and categorization of detections created for the APT29 scenario.
These are some of the main steps taken to perform an evaluation that I documented after reading the blog post:
- Identify the adversary that will be emulated
- Build out the adversary profile through open source threat intelligence
- Map adversary behavior to ATT&CK techniques
- Develop the emulation plan (Step-by-step playbook broken down to the procedure level)
- Develop a baseline and detection criteria (MITRE team)
- Execute emulation plan (MITRE team)
- Capture detections from endpoint vendor
- Review and validate detections (MITRE team assigns detection categories)
- Normalize results for consistency across vendors evaluations
- Produce and release all results
What About a Free Telemetry Evaluation 🤔?
Technically, MITRE released everything one would need in order to start at the “Execute Emulation Plan” step, go through the next ones and generate customized results. However, that was not my initial plan 😆 All I wanted to do was to emulate the adversary, generate data and share it with the community 🌎 to create detection research opportunities.
Enter Detection Hackathon
On May 2nd, 2020, several members of the community joined Day-1 of the event where I showed how I had planned to run it. I had created a GitHub repository with two projects and organized all the emulation steps and sub-steps as GitHub issues.
Next, each sub-step had specific
adversary procedures and
detection criteria that were previously defined by the MITRE team and used as part of the evaluation process for all endpoint vendors. The idea was to use them as guidance to inspire some initial detections. However, there was more that we could do once we had a detection mapped to a detection criteria > subs-step > Emulation step.
Enter Detection Categories
As I mentioned before, the MITRE team reviews and validates all the detections provided by all endpoint vendors. During that process, the MITRE team assigns detection categories to label detections based on the level of details provided about a specific adversary behavior.
According to the MITRE team:
This doesn’t imply that every technique executed should aim for a technique detection. Some techniques may only warrant basic telemetry, or even limited visibility based on the vendor’s detection strategy.
With all that information, I decided to only assign the Telemetry detection category to every detection created from the telemetry provided during the hackathon that could show that the adversary behavior happened 💡
..and Free Telemetry 💸 ?
Great question 😆! All the event log providers present in the APT29 datasets are either built-in (i.e Windows Security Auditing) or 3rd party open providers (i.e. Sysmon). I also collected some PCAPs which were then converted to Zeek files thanks to Nate “neu5ron” Guagenti 🙏.
In the image below you can see the top host event providers present in the APT29 dataset for Day 1:
Assigning the Telemetry detection category to Free Telemetry is why I believe we could also perform a “Free Telemetry Evaluation” 💥 💥
What about the other detection categories?
The datasets that I created are just plain raw data without any additional context or enrichment that would allow the labeling of other categories.
These categories were defined prior to the evaluation and are hierarchical. As you move through the categories from none through technique, left to right, the analyst is provided more detail on what happened
Therefore, I believe it makes sense to only use the Telemetry Category. Also, remember detection categories allow organizations to also choose what they care the most from an endpoint vendor. I have worked with organizations that want telemetry only. Some only want alerts mapped to techniques 🤔.
Conducting a Free Telemetry Evaluation 🤠 ☑️
Now, that we are all on the same page about the concept of a “Free Telemetry Evaluation”, I would like to show you how I am currently performing one, and some of the initial detections and potential contributions to open source projects such as Sigma so far.
Prepare Emulation Environment
As I mentioned before, the MITRE team released the emulation plan, all payloads used for Day 1 and Day 2 , and a Do-It-Yourself Caldera plugin. Therefore, I needed to create a similar environment to run the emulation plan and generate free telemetry. I documented the whole setup in this blog post:
- Mordor Labs 😈 — Part 1: Deploying ATT&CK APT29 Evals Environments via ARM Templates 🚀 to Create Detection Research Opportunities 🌎!
Execute Emulation Plan
Next, I needed to execute the emulation plan. I would like to thank Jamie Williams so much for all the help during this step. I learned a lot and was able to also expedite the whole process for anyone that would like to do it manually. I wrote one blog post for each scenario and also created videos 😉
Collect Free Telemetry
As I mentioned before, I collected host and network data leveraging built-in and 3rd party open event providers. The following image shows how I collected event logs from all Windows endpoints involved in the emulation. I also collected PCAP via the NetworkWatcherAgentWindows extension. This extension allows you to create packet capture sessions to track traffic to and from a virtual machine. I provide more details about it in the following post:
- Mordor Labs 😈 — Part 1: Deploying ATT&CK APT29 Evals Environments via ARM Templates 🚀 to Create Detection Research Opportunities 🌎!
Create Detections (Community Effort ❤️)
As I mentioned before, I organized a detection hackathon to bring other researchers in the community together to collaborate and develop some detections over free telemetry. We used GitHub issues and projects features to track detection rules in a form of queries, screenshots or notes.
It was amazing to see the initial conversations and contributions:
Going from host to network datasets 😍 💜
and even helping others to develop queries. I ❤️ our InfoSec community 🌎
Then, I took all the current detection rules, and started documenting them in YAML format to programmatically create an “All Results” report.
For example, I went from this GitHub issue (Step > Sub-Step) format
To this YAML File (Format is still a work in progress. Feedback is appreciated)
I did the same for every single detection contributed by the community and all of the ones I also created for Day 1. It was fun! 😆
Review and Validate Detections
Once I was done documenting all the initial detections for Day 1, I started to perform an additional validation process. I wanted to:
- Validate detections alignment with the detection criteria.
- Make sure queries would work 😉: I needed to somehow be able to run those queries against the APT29 datasets in a programmatic way. I will get to it in a little bit ⏰ after showing you how I am building the initial report.
Produce “All Results” Report
I put together a basic Python script to loop over every single YAML file and use a Jinja template to create a similar table to the one provided in the ATT&CK website as part of the results for each endpoint vendor:
I worked on the initial Python script below to accomplish the following:
- Create detection files for each detection query available in each step file.
- Create an “All Results” table similar to the image above where I aggregate every single detection with some metadata for a similar look/view.
This is the Jinja template I used to create the main report table:
This is the Jinja template I used to create detection files to capture the query and output mapped to the specific adversary behavior (Step > Sub-step):
You might be asking yourself: Why a detection file? If you take a look at the “Detection Notes” section of an “All Results” report, there are links (i.e ).
If you click on those links, it would take you to a screenshot of the vendors tool. All I have is telemetry and queries 😆, so I can only show you the following information about each detection:
I actually ❤️ to see the logic more than just a screenshot of a vendors GUI, but I know there are several reasons why a screenshot of an alert or simply the output data would be enough for an endpoint vendor to show (i.e Intellectual property).
Display “All Results” Report
After running the python script, I get the following report 😱 . All based on free telemetry.
What can we do with this report or information behind it now?
Calculate Coverage Results 💰 (Day 1)
Now that we have a report with every single detection mapped to each step of the emulation plan, we can get some insights into how much one could cover from a telemetry perspective against the APT29 emulation plan using free telemetry 🎊 . In the image below you can see how many sub-steps are covered from a telemetry perspective for each emulation step.
I was impressed but not surprised that one could cover a lot with free telemetry 😉. I see this as an example that we could also complement other endpoints solutions coverage and enhance the detection strategy of an organization. I would map what is is that your vendor is missing and start a project to address the gaps potentially with free telemetry (vice-versa too).
What else can we do to allow anyone in the community to benefit from everything that I have shown you so far?
Enter Jupyter Notebooks
As part of the review and validation process, I wanted to be able to take the queries mapped to each detection and run them all against the APT29 dataset. I already had everything in YAML. Why not do the following?:
- Loop over every single YAML file that contains detection queries.
- Use python and nbformat to create a notebook and add every single detection query to it.
- Download the APT29 dataset programmatically to the notebook and process it as a dataframe to perform additional analysis.
- Use SparkSQL to query the dataframe (Spark dataframe APIs via python). For the initial round of the detection hackathon, I encouraged participants to share their queries in a SQL-like format (SparkSQL) to expedited this process 😉. We are going to provide a workshop next time (Day 2) to expedite this process and get everyone familiarized with it. That will be a main requirement for Day 2.
I added the following code to the Python script that I created to produce the initial report table to also create a notebook and be able to run queries.
A Threat hunter's playbook to aid the development of techniques and hypothesis for hunting campaigns. …
Once the code runs, we get the following notebook with every single detection with their respective
detection criteria and queries mapped to specific emulation steps (steps and sub-steps) 💥
An interactive notebook that anyone could use to run every single query and validate the results 🍻 This is the beauty of open source projects ❤️
Oh and you can also find the code provided by Jose Luis Rodriguez to create the bar-chart to show the current coverage of free telemetry from a telemetry detection category perspective. Thank you brother ❤️
How Can I Use All These Resources ? 🚀
All the resources that you see so far are available in the Threat Hunter Playbook’s website and you can even run the notebook from there through open cloud infrastructure provided by the Binder team.
All Results Report Table
An easy way to explore all the detections mapped to emulation steps.
Run APT29 Evals Notebook
If you want to interact with all the queries provided in the results for day 1, click on the little rocket on the top-right of your screen > Binder
That should take you to the BinderHub site where you will see the threat hunter playbook’s Jupyter notebook server being launched.
Next, you will be presented with a notebook’s interface similar to the one I showed you before where you will be able to run the notebook. As you can see, everything is through your browser. Nothing to install locally 😉 🍻
That’s It? There is more.. 😆 Contributions!
Finally, even though it has been so much fun to put all this together for the community, I believe it is important to also show how we are planning to contribute back to other open source projects as one of main goals of these collaboration efforts. As a result from Day 1, we will contribute to projects like Sigma and submit rules that are not currently present in the project.
It was amazing to see the community creating rules to share during Day 1:
If you want to see the current rules that will be pushed to Sigma for review soon and keep up with future ones, I would bookmark this link to the detection hackathon repository:
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…
Maybe a quick look to some of the current ones? 😆 😱
What’s Next? :
Detection Hackathon — Day 2 ⏳ 🏹
Everything that I showed in this post was just data from Day 1. We still have data for Day 2 and it would be awesome to have you in the next round. If you like what you see so far with day 1 and want to participate, feel free to register for Day 2 by filling out this form (Date is still TBD)
We have learned from the initial round (Day 1) and we are going to:
- Extend the time for the live session from 4 hours to maybe 8 hours but split in two days (weekends). Hackathon can run for days, but the live sessions are key to ask questions and learn more about the process.
- Provide an initial training to go over Jupyter Notebooks and the basics of Python and PySpark (A free 1–2 hours workshop)
- Have other SIEMs sponsored or provided by the community to explore the data in a more familiar way meanwhile participants get used to Jupyter Notebooks and PySpark (It is always good to have options)
- Automate the way how we go from GitHub issues to YAML files (I will try)
- Live stream music through a separate channel (not the live session call) to allow participants to mute the music if they want to without muting the main communication channel 😉 (Music is a must!)
- I believe we could also use these resources for training purposes for other query languages besides SparkSQL (i.e Kusto Query Language)
- Having data mapped to adversary procedures created from open source threat intelligence is a great material to link queries and techniques to one adversary profile. Not just one query mapped to one adversary technique but several queries mapped to a complete behavior or adversary profile.
- If you get to use this material in your own training, make sure you attribute the work to the Open Threat Research Community!. These are the types of collaboration efforts and projects that this new community movement will prepare and run for the InfoSec community. Stay tuned for more news about it!
That’s it for this post! I hope this resource was helpful to those that were wondering how free telemetry could be used for these type of ATT&CK evaluations and wanted to learn more about the current results of the detection hackathon I help to run on behalf of the Open Threat Research community. I hope you can join us next time for day 2!
- All Results Report: https://threathunterplaybook.com/evals/apt29/report.html
- Jupyter Notebook: https://threathunterplaybook.com/notebooks/campaigns/apt29Evals.html
- Datasets: https://github.com/hunters-forge/mordor/tree/master/datasets/large/apt29
- Detection hackathon registration form: https://bit.ly/APT29EventDay2