Threat Hunter Playbook ⚔ + Mordor Datasets 📜 + BinderHub 🌎 = Open Infrastructure 🏗 for Open Hunts 🏹 💜
It has been almost three years since I started documenting detections publicly, and I always wondered “How could I share detections in a more practical and interactive way so that anyone in the world can access, run and validate each analytic all from the same place?”
I did not just want to share a static document with rules or queries that anyone can just copy and paste. I wanted to provide a way to encourage collaboration and allow others to be able to even query the datasets that I was using during the development of the analytic. Also, I figured that if I found a way for others to test detections without the need of building an analytic platform on their own, it would actually help those that do not even have an infrastructure or the resources or the skills to do so around the world.
In this post, I will show you how I was able to integrate detections from the Threat Hunter Playbook initiative and pre-recorded datasets from Mordor with the amazing BinderHub project to empower the community and allow others all over the world to interactively run each detection and produce the same results in a public computing environment all from a web-browser.
I believe that before jumping into the technical aspects of how I accomplished this, it is important to first understand what each project besides BinderHub provides. There is a lot of documentation for each of them so I will provide a short summary and links for you to read more about them.
What is the Mordor project?
This project is part of the Threat Hunters Forge community, and started as a way for me to share datasets with my brother Jose Luis Rodriguez while doing research together. We realized that each of us was spending too much time simulating the same adversary technique variation while generating the same security events from a behavior perspective.
For example, what happens when an adversary queries the values of a Windows registry key? Either if it is done via PowerShell or C#, an adversary will trigger several of the same events shown below from a Windows Security event perspective. Of course, only if the right access-control-entry (ACE) is set in the system-access-control-list (SACL) of the registry object 😉
We figured that if we can standardize our simulation environment and document the telemetry enabled, one of us can run the simulation, collect the data and share it with the other one for further analysis. Then, we started thinking about others having similar constraints or not being able to even simulate the technique in the first place, so we decided to start sharing every single dataset with the world and thats how the Mordor repository was born 😈📜 For more information about how we take snapshots of the data while simulating the adversarial technique or how we consume the pre-recorded dataset, you can go to the project’s Wiki or watch the following video:
What is the Threat Hunter Playbook project?
The Threat Hunter Playbook is another initiative from the Threat Hunters Forge community to share hunting strategies and inspire new detections. Its official twitter handle is @HunterPlaybook, and I use it to share a detection notebook every other week. It provides specific chains of security events that you can use to develop data analytics in your preferred tool or query format. This project also follows the structure of the MITRE ATT&CK framework categorizing post-compromise adversary behavior in tactical groups.
In addition, the project documents detection strategies in the form of interactive notebooks to provide an easy and flexible way to save, replicate, and visualize the analytics and expected output.
What is a Notebook?
Think of a notebook as a document that you can access via a web interface that allows you to save input (i.e live code) and output (i.e code execution results / evaluated code output) of interactive sessions as well as important notes needed to explain the methodology and steps taken to perform specific tasks (i.e data analysis).
Also, each notebook comes with links to pre-recorded datasets from the Mordor project to validate the detection of the specific adversarial technique being analyzed. This is very helpful because I feel that this feature provides additional context to the analytic being shared.
Mordor Datasets and the Threat Hunter Playbook?
I decided to start adding Mordor datasets links to each detection. I hope it encourages others to take a look at the dataset being used and potentially help me to improve the detection logic. There are several events that I might have not even considered when creating a detection (always! 😆).
For example, for the notebook “Remote Interactive Task Manager LSASS Dump”, I consider the following data sources useful to develop an analytic
- Security 4778 : A session was reconnected to a Window Station
- Sysmon 1: Process Creation
- Sysmon 10: Process Access
- Sysmon 11: File Create
However, if I take a look at the metadata provided by the Mordor dataset used for this detection, there are some other events that might be interesting to explore to enhance the detection or create additional analytics 😉
What am I trying to solve then?
So far, I believe and I hope that sharing detections via notebooks and providing links to the the pre-recorded datasets used during the development of the analytic is helpful. While some people might just want to copy and paste the analytic provided, others might want to run some validation tests and go deeper into the analytic from a data perspective. I believe this also helps researchers that are just starting in the industry and might want to jump straight to the analysis of data and use the analytics provided as references or examples for future detections development.
Even though this sounds great ( I hope!) , it takes the following steps to be able to fully utilize the notebook and Mordor dataset approach.
- Clone https://github.com/hunters-forge/ThreatHunter-Playbook repo
- Stand up a Jupyter Notebook Server to host all the notebooks provided by the Threat Hunter Playbook project.
- Install Python libraries such as PySpark and OpenHunt in your Jupyter Notebook server to run the data analytics in the notebooks.
- Clone the https://github.com/hunters-forge/mordor repo inside of the notebooks directory or download each dataset interactively while you run each analytic.
- Decompress every mordor dataset (they are .tar.gz files)
For those familiarized with Docker containers, all of the steps above can be put together in a Docker file (Dockerfile) and 💥 💥. However, even though it might very easy for you to deploy a docker container in your computer, not everyone has the same skills to do it. Therefore, I started to look for ways to automate the deployment, but I could not find an easy one to be able to share my research without some basic knowledge about Docker or vagrant or packer or AWS CloudFormation or any other way to deploy something either locally or via a cloud provider. I was looking for a one-click approach. This was true until I read about the Binder project and its product BinderHub 😍
Enter Binder project
The Binder Project is an open community that makes it possible to create sharable, interactive, reproducible environments. The main technical product that the community creates is called BinderHub, and one deployment of a BinderHub exists at
mybinder.org. This website is run by the Binder Project as a public service to make it easy for others to share their work.
What is BinderHub?
The primary goal of BinderHub is creating custom computing environments that can be used by many remote users. BinderHub enables an end user to easily specify a desired computing environment from a Git repo. BinderHub then serves the custom computing environment at a URL which users can access remotely.
How does BinderHub work?
According to BinderHub docs, it connects several services together to provide on-the-fly creation and registry of Docker images.
BinderHub ties together:
- JupyterHub to provide a scalable system for authenticating users and spawning single user Jupyter Notebook servers.
- Repo2Docker which generates a Docker image using a Git repository hosted online.
It utilizes the following services:
- A cloud provider such Google Cloud, Microsoft Azure, Amazon EC2, and others
- Kubernetes to manage resources on the cloud
- Helm to configure and control Kubernetes
- Docker to use containers that standardize computing environments
- A BinderHub UI that users can access to specify Git repos they want built BinderHub to generate Docker images using the URL of a Git repository
- A Docker registry (such as gcr.io) that hosts container images
- JupyterHub to deploy temporary containers for users
Do I need to create my own BinderHub?
You can create your own BinderHub deployment and run code in the cloud if you want to, but the Binder team has already done that for you and has a BinderHub server running at
mybinder.org as a public service (free!). If you browse to that site, you just need to provide the name or URL of the repository hosting a Binder repository.
You can host your Binder repository in Github, GitLab, etc as shown below
A Binder Repository?
According to Binder docs, a Binder (also called a Binder-ready repository) is a code repository that contains at least two things:
- Code or content that you’d like people to run (i.e Jupyter Notebooks)
- Configuration files for your environment (i.e Dockerfile)
What are “Configuration files” ?
These are files used by Binder to build the environment needed to run your code. For example, if I want to build a Jupyter Notebook server, I have to tell Binder how to do it. For a list of all configuration files available to create an environment, see the Configuration Files page . In addition, there are several ways to do it and you can see some examples of Binder repositories in here.
I use Docker containers to build Jupyter Notebooks so I decided to create one to be able to share my notebooks and research with others. I also have a whole project around Jupyter Notebooks and Docker containers named Notebooks Forge , a project dedicated to build and provide
Notebooks servers for
Offensive operators (purple post on this coming soon.. 😉)
A Binder Repository for the Threat Hunter Playbook Environment via Docker
If you want to use Docker for your own Binder repository, make sure you read the Binder docs and take care of all the requirements specified in the docs.
As I mentioned before, I build Jupyter Notebooks servers via Docker files from the project Notebooks Forge and I keep built docker images in my own public Docker registry. You can just download the images and use them right away.
For my Binder configuration file, I take that Docker image named jupyter-pyspark from my public docker registry, and use it as a base to take care of all the steps needed to integrate the Threat Hunter Playbook and Mordor projects with the specific requirements needed to work with the BinderHub deployment. You can read the contents of the Dockerfile here.
I create the Docker file (Dockerfile file) and place it at the root of the Threat Hunter Playbook GitHub repository as shown below:
How does BinderHub build the Threat Hunter Playbook environment?
There are several steps that happen in the backend, but all I need to do is go to the https://mybinder.org/ site, and fill out the required information as shown below. If you notice, once I type the URL of my GitHub repo, it gives me the following Binder link https://mybinder.org/v2/gh/hunters-forge/ThreatHunter-Playbook/master to share with others right away.
Also, when I click on the drop-down triangle at the bottom of the page, I get information about a binder badge that I could use for my Threat hunter Playbook README file. This can help automate the initial steps.
Finally, all I have to do is click on the orange Launch button and BinderHub starts building the Jupyter Notebook environment from the Dockerfile that I created at the root of the repo including all the notebooks from the Threat Hunter Playbook and pre-recorded datasets from the Mordor project.
With the Binder badge available, I now automate the steps for anyone accessing the repo. All you have to do is click on “launch binder” and 💥.
But, What happens in the backend?
After clicking on the Binder badge, BinderHub resolves the link and the following happens in the backend:
BinderHub checks if the Docker image already exists in its Docker Registry.
- If it does not exists (first time building it), BinderHub creates a Kubernetes build pod that uses Repo2Docker to turn the GitHub repository into a Jupyter enabled Docker Image. It takes the Dockerfile in the repo, builds the image, pushes it to the BinderHub’s Docker registry, and saves the registry information for future reference.
- If the image exists, it checks if it is up to date. If it is not up to date, it uses again Repo2Docker to create a new image and update the one in its Docker Registry.
If the Docker image exists and it is up to date, BinderHub sends the Docker image registry to JupyterHub
- JupyterHub creates a Kubernetes pod for the Jupyter Notebook image
- JupyterHub monitors the user’s pod for activity, and destroys it after a short period of inactivity.
In the meantime you will see the following screen
Once all that happens, you will be presented with a Jupyter Notebook menu interface. There you will see one folder which contains markdown files and Jupyter notebooks from the Threat Hunter Playbook project.
If you want to access Windows detections as notebooks, you can go to the following path
For example, you can double-click on the “Remote Interactive Task Manager LSASS Dump” notebook, and you will be able to interactively run every single input cell. The first ones are just markdown text.
If you go back to the main notebook menu and click on the
Running tab, you can see that the notebook is actually running and it is not a static view.
You can run every single notebook cell by simply pressing [SHIFT] + [ENTER] on your keyboard. The ones in the image below do the following:
- Import python libraries
- Start a Spark session
- Read the contents of the specific Mordor file and returns a DataFrame
- Expose the DataFrame as a Spark SQL temporary view to run SQL-like queries on the top of it.
This notebook was created to contribute to CAR-2019–08–001 , so I decided to validate the provided analytic against the Mordor dataset as shown below:
CAR Analytic Pseudocode
files = search File:Create
lsass_dump = filter files where (
file_name = "lsass*.dmp" and
image_path = "C:\Windows\*\taskmgr.exe")
Threat Hunter Playbook SQL Query
SELECT `@timestamp`, computer_name, Image, TargetFilename, ProcessGuid
WHERE channel = "Microsoft-Windows-Sysmon/Operational"
AND event_id = 11
AND Image LIKE "%taskmgr.exe"
AND lower(TargetFilename) RLIKE ".*lsass.*\.dmp"
I then decided to add other potential detections and started joining Sysmon events to add more context to an initial analytic.
SELECT o.`@timestamp`, o.computer_name, o.Image, o.LogonId, o.ProcessGuid, a.SourceProcessGUID, o.CommandLine
FROM mordor_file o
INNER JOIN (
WHERE channel = "Microsoft-Windows-Sysmon/Operational"
AND event_id = 10
AND lower(TargetImage) LIKE "%lsass.exe"
AND (lower(CallTrace) RLIKE ".*dbgcore\.dll.*" OR lower(CallTrace) RLIKE ".*dbghelp\.dll.*")
ON o.ProcessGuid = a.SourceProcessGUID
WHERE o.channel = "Microsoft-Windows-Sysmon/Operational"
AND o.event_id = 1
I also like to show how you can join data sources (i.e Sysmon and Windows Security events). and perform join statements on common fields with unique values. In this notebook, I show how to join the results of the first Sysmon join I performed in the image above with Windows Security event 4778: : A session was reconnected to a Window Station on the LogonId values. The detection goes from a local to a remote interactive behavior via RDP.
That’s it! I can now share datasets and detection notebooks and allow others to access, run and validate each analytic all from the same place! 💜
What does it actually look like?
If you want to see the one-click process via a browser live before you giving it a try, I put together a video showing you how easy it is for anyone in the world to access detection notebooks and pre-recorded datasets and be able to interactively run every single analytic via a web browser and an open computing environment by leveraging the awesomeness of BinderHub.
UPDATE 12/18/19: The folder structure in the video is out of date. However, the concept is the same and I believe it is still valuable to see how it works.
I shared the same video as part the ATT&CKcon 2.0 talk Jose Luis Rodriguez and I put together along with the slides of our presentation below:
Binder Usage Guidelines
Even though the Binder provides a public BinderHub service for anyone to use, there are certain guidelines that one should follow. You can read more about it here
- The Threat Hunter Playbook can be subject to a temporary banning at any time if it presents some undesired behavior defined by the Binder team. I see the project as a proof of concept, and a way to demo and share detection notebooks as a result of public open research.
- The Binder team does not want a single repository to dominate all of the traffic to Binder, so they have set a maximum limit of concurrent user sessions that point to the same Binder link. The maximum number of simultaneous users for a given repo is 100.
If you have additional questions, you can start by taking a look at the “Frequently Asked Questions” section of the Binder docs. Very helpful!
Empowering The Community 🌎 🌍 🌏
I have not seen this technology being used before for these type of use cases in the Infosec community, and I am very happy that the Threat Hunter Playbook is the first public project in our industry leveraging BinderHub to share detections with others around the world and allow anyone with a web-browser to reproduce research via notebooks and pre-recorded datasets 🍻
- I am preparing some material for a few workshops and training classes using all the projects mentioned in this post so stay tuned! 😃 🍻 💜
- Create an official BinderHub for the Infosec community to share any research via notebooks (Hoping to get sponsor from the community)
- This year (2019) I was able to do a live demo at the SANS Threat Hunting Summit and ATT&CKcon 2.0 , and it was the first time, AFAIK, that anyone in the audience could interactively run and validate an analytic during a presentation without having infrastructure pre-built by me or others watching it. I am planning on doing more of this with a bigger audience in 2020 and take the “Demo Gods” challenge to the next level 😆
This is it for this post! I hope you enjoyed it! Any feedback is appreciated 😃 Also, if you want to continue the conversations about this technology and all the other projects mentioned in this post and stay up to date with future training events, feel free to join the Hunters Forge Slack group.
Automatic free slack invitation: https://launchpass.com/threathunting
Also, if you are in the DC metro area, feel free to join the NOVA Threat Hunters Forge meetup.