Jupyter Notebooks 📓 from SIGMA Rules 🛡⚔️ to Query Elasticsearch 🏹

Published in

Open Threat Research

12 min readJan 11, 2020

Happy new year everyone 🎊! I’m taking a few days off before getting back to work and you know what that means 😆 Besides working out a little bit more and playing with the dogs, I have some free time to take care of a few things in my to-do list for open source projects or ideas 😆🍻 One of them was to find a way to integrate Jupyter Notebooks with the SIGMA project. It sounded very easy to do at first, but I had no idea how to create notebooks from code or how I was going to execute sigma rules on the top of them.

In this post, I will show you how I translated every rule from the Sigma project to Elasticsearch query strings with the help of sigmac, created Jupyter notebooks for each rule with a python library named nbformat and finally added them to the HELK project to execute them against Mordor datasets.

Optional Reading

I highly recommend to read a few of my previous blog posts to get familiarized with some of the concepts and projects I will be talking about in this one:

What are we talking about again?

Let’s take a look at this sigma rule: sysmon_wmi_event_subscription.yml

title: WMI Event Subscription
id: 0f06a3a5-6a09-413f-8743-e6cf35561297
status: experimental
description: Detects creation of WMI event subscription persistence method
references:
    - https://attack.mitre.org/techniques/T1084/
tags:
    - attack.t1084
    - attack.persistence
author: Tom Ueltschi (@c_APT_ure)
date: 2019/01/12
logsource:
    product: windows
    service: sysmon
detection:
    selector:
        EventID:
            - 19
            - 20
            - 21
    condition: selector
falsepositives:
    - exclude legitimate (vetted) use of WMI event subscription in your network
level: high

What can I do with it?

According to Sigma’s Sigma Converter (Sigmac) documentation, as of 01/09/2020, you can translate rules to the following query formats:

If you are wondering how to get that information, you can get it by running this sigmac script with the following flag:

sigmac -l

That means that if you are using any of those tools to query security events, you can easily get a sigma rule converted to their format and run it.

Then, why do I need Jupyter Notebooks? 🤔

That’s a great question! What if I told you that you could:

Connect to platforms such as Elasticsearch or Azure Sentinel via their APIs
Run the queries you translated with Sigmac on the top of them
Get the results back and complement your analysis with additional Python libraries such as Pandas or PySpark. Maybe use other programming languages such as R or Scala for additional data analysis.
Document every step taken during your analysis and even add visualizations to tell the story in a more practical way
Save the input (queries) and output (queries results) as part of your doc
Share and reproduce the analysis and get similar results

All from the same tool and for free! 🍻 Yeah, I don’t know about you, but I like to complement and extend the data analysis capabilities that some of those platforms already provide, and share the results with others in the community in a more practical and interactive way. I hope this got you a little bit more interested in data analysis with Jupyter notebooks 😉 👍

How do I use a Notebook to query sigma rules?

First, we need to identify platforms that allow query execution via their APIs and that have Python libraries used to simplify the process and that can be supported via Jupyter Notebooks. Some of the ones that I have used are:

Elasticsearch: Elasticsearch-DSL Python Library
Azure Sentinel: MSTICPY Python Library (posts soon 😉)
Splunk: Splunk-SDK

For the purposes of this post, I will be connecting to an Elasticsearch database using the elasticsearch-dsl python library from a notebook

Pre-Requirements:

Elasticsearch Database
Jupyter Notebook (with elasticsearch-dsl python package installed)

Enter The HELK

I like to practice like I play ⚽️ and the HELK project has been very helpful for several use cases throughout my career. This one is not an exception 💙

Install HELK

You can simply run the following commands and you will get an ELK stack, Kafka broker and a Jupyter Notebook.

$ git clone https://github.com/Cyb3rWard0g/HELK
$ cd HELK/docker$ sudo ./helk_install.sh -p <kibana password> -i <HELK ip address> -b 'helk-kibana-notebook-analysis' -l 'basic'

Access Jupyter Notebook Server

Run the following command to retrieve the Jupyter server token needed to access the currently running notebook server:

$ sudo docker exec -ti helk-jupyter jupyter notebook listCurrently running servers:
http://0.0.0.0:8888/jupyter/?token=fbb6b53c4ecf6179bd6e45907bbe464c4f26b5bbd2e8879e :: /opt/jupyter/notebooks

Copy the token (i.e. fbb6b53c4ecf6179bd6e45907bbe464c4f26b5bbd2e8879e) and use it in this URL to access your HELK’s Jupyter notebook server.

https://<HELK ip address>/jupyter/?token=<token>

You will be redirected to a similar view:

Create a new notebook

Click on new > Python3 as shown below:

You will get a blank notebook where you can run some python code

Query Elasticsearch Database

Import Libraries

Import the following libraries and hit SHIFT+ENTER to run the cell

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd

Initialize the Elasticsearch client

Then, we need to initialize an Elasticsearch client using a specific Elasticsearch URL. The one you will need to use for HELK is http://helk-elasticsearch:9200. That is because the Elasticsearch container name is helk-elasticsearch as shown here. Next, you can pass the client to the Search object that we will use to represent the search request in a little bit.

es = Elasticsearch(['http://helk-elasticsearch:9200'])
searchContext = Search(using=es, index='logs-*', doc_type='doc')

Set the query search context

In addition, we will need to use the query class to pass an Elasticsearch query_string . For example, what if I want to query event_id 1 events? Or maybe use the query_string you get after translating a sigma rule via sigmac for the es-qs target environment 😉

s = searchContext.query('query_string', query='event_id:1')

Run Query & Explore Response

Finally, you can run the query and get the results back as a DataFrame

response = s.execute()if response.success():
    df = pd.DataFrame((d.to_dict() for d in s.scan()))df

and that’s one of the methods you can use to query and Elasticsearch database from a Jupyter notebook. I blogged about another method in here too 🍻.

What does it have to do with Sigma? 👀

Well, as I mentioned before, we could use the output of a sigma rule translated to an Elasticsearch query string and execute it against ES from notebooks. Let’s start by converting sysmon_wmi_event_subscription.yml

Download Sigma

git clone https://github.com/Neo23x0/sigma
cd sigma/tools

Sigma rule to Elasticsearch query string

If we want to convert sysmon_wmi_event_subscription.yml to an Elasticsearch query string, you need to run the following commands:

./sigmac -t 'es-qs' -c config/helk.yml ../rules/windows/sysmon/sysmon_wmi_event_subscription.yml

-t : Output target format
-c: Configuration with field name and index mappings

and get the following results:

event_id:("19" OR "20" OR "21")

Finally, you can simply add that to the python code as shown below.

As you can see, I did not get any results, but that’s because I do not have events that match that in ES. It would be great to have data to test the query! 😉 Maybe this dataset? 😱 🙀 For now, I hope you understand how we can start converting sigma rules to Elasticsearch query strings and query an Elasticsearch database from Jupyter Notebooks.

Can I automate the creation of several rules ➡️ Elasticsearch query strings?

Oh yeah! You can do something like this in bash. If you are still in the sigma tools folder, run the following in your terminal.

for rule_category in ../rules/windows/* ;
  do for rule in $rule_category/* ;
    do Sigmac -t 'es-qs' -c config/helk.yml $rule;
  done;
done

How do I create a Notebook for every sigma rule translated to Elasticsearch query string ?

Here is where things get a little bit more interesting. There are more than 300 rules in the project. There is NO WAY, I will do the following +300 times:

Create a notebook via notebook server UI
Copy paste the code from the initial notebook we used to query ES
Convert the sigma rule to Elasticsearch query string
Make sure I am setting the right ES index for the specific rule.
Copy the query string result and paste in the notebook code

I would need to automate the creation of notebooks via code, but how do I even create a notebook from code? 🤔

Enter nbformat Python Library

nbformat contains the reference implementation of the Jupyter Notebook format, and Python APIs for working with notebooks.

I used this library in this previous post, and I provided details on how it could be used to create several notebooks in a for loop with some python code. I highly recommend to read about it to get familiarized with it.

You can install it via pip

pip install nbformat

I wrote the following python script to create a notebook following the format of the notebook we used earlier and the code to query Elasticsearch:

if you run the script, you will get a notebook named es_notebook_nbformat.ipynb in the same folder:

$ python3 es_notebooks.py

You can then import it into your notebook server as shown below

The result will be the following notebook 😱

Sigmac + nbformat = Sigma Notebooks 🔥

Next, I put together the following script to translate our initial sigma rule to an Elasticsearch string, parse the yaml file to get some metadata and finally create a jupyter notebook I could use to query the HELK’s Elasticsearch following also the notebook we used to query ES.

You will need to first install the sigmatools python package to have the sigmac script available in your PATH. That is why you do not see any import sigmatools in the python code below. sigmac script is added to your PATH so that you can call it without setting a script location for it.

pip install sigmatools

If you run that script and point it to the sigma rule sysmon_wmi_event_subscription.yml with the default HELK config provided in the sigma project already,

$ python3 sigmac_nbformat_notebook.py -r sigma/rules/windows/sysmon/sysmon_wmi_event_subscription.yml -c sigma/tools/config/helk.yml

you will get the following notebook 😱 💥 🌟

Pretty cool! But, we still need data to test it! 😢

Re-Play the WMI Subscriptions for persistence Mordor Dataset 😈 to test the Sigma Notebook

If you want to read more about the mordor project, you can do it here:

https://mordordatasets.com/introduction

The mordor dataset that we will be using today is documented in here:

https://mordordatasets.com/notebooks/small/windows/03_persistence/SD-190518184306.html

Remember that you can explore every single mordor dataset via interactive notebooks from your web browser too (just in case you forgot!). 😉

Install kafkacat

If you are using a debian-based system, make sure you install the latest Kafkacat deb package.
I recommend at least Ubuntu 18.04. You can check its Kafkacat deb package version and compare it with the latest one in the Kafkacat GitHub repo.
You can also install it from source following the Quick Build instructions.

Download & Decompress Mordor Dataset

Download the mordor dataset and decompress it

curl https://raw.githubusercontent.com/hunters-forge/mordor/master/datasets/small/windows/persistence/empire_elevated_wmi.tar.gz -o empire_elevated_wmi.tar.gztar -xzvf empire_elevated_wmi.tar.gz

Send Data

we will send the data to the HELK’s kafka broker that is listening on port 9092. You might ask youself why to Kafka and not to Elasticsearch directly?. If I send it to Elasticsearch directly, I wont be able to apply the standardization provided by the HELK Logstash pipeline. I practice like I play ⚽️!

kafkacat -t winevent -b <HELK IP>:9092 -P -l empire_elevated_wmi_2019-05-18184306.json

Run Sigma Notebook

Wait a few seconds and then run your notebook. Remember you can run each cell by hitting SHIFT+ENTER in your keyboard while selecting a cell. As you can see below, the query matched the Empire Elevated WMI subscription mordor dataset . 🙏

Before moving to the last part of the blog post, I would like to emphasize the value of having a project like Mordor. It only took me a few seconds to download, decompress and send data to HELK. I did not have to stand up a lab environment, run the attack and then take a look at the data.

But, How do I create a Notebook for every sigma rule translated to Elasticsearch query string ? 😆

I almost forgot to show you that! Well, I grabbed all the scripts I have shown you in this post and created an official script that is now part of the HELK project to accomplish that. It is named sigma2esnotebook 😉 and you can find it here and explore its parameters with the following command:

$ python3 sigma2esnotebooks -h

You can run it the following way:

$ python3 sigma2esnotebooks -r sigma/rules/ -c sigma/tools/config/helk.yml -e http://helk-elasticsearch:9200 -i logs-* -o notebooks/sigma/

since the script parses each sigma rule metadata, it identifies and retrieves the specific product (i.e windows) and service (i.e sysmon) available on most of the rules and matches them to the indices specified in the HELK mappings config . The example below shows sysmon rules being processed and mapped to HELK’s logs-endpoint-winevent-sysmon-* index:

Would the notebooks replace Elastalert rules? 🤔

I wrote about elastalert and sigma here before. All I would say is that if I want to run notebooks in a schedule and automatically (i.e every hour) you can do it with libraries such as papermill from nteract and cron jobs or airflow. However, that is a topic for one of the things on my to-do list 😉

Future work

Azure Sentinel: MSTICPY Python Library + notebooks + sigma
papermill from nteract and cron jobs or airflow for automation
notebooks + sigma + splunk maybe?

That’s it! I hope you enjoyed this post! I have not seen this implementation anywhere else so I hope you could test it and provide feedback if it is possible! There are a few things that I will add to the script (i.e. multiple target environments), but for now it is a good first implementation of notebooks for the sigma project. Once again, if you want to contribute to it, all the notebooks created for each sigma rule after running the script are now part of the HELK project:

and the script is available also as part of the HELK project:

https://github.com/Cyb3rWard0g/HELK/blob/master/scripts/sigma2esnotebooks

Finally, If you would like to fund my contributions and become a sponsor to my open source work, you can read about it in the link below. GitHub will match your contribution! 😱

Sponsor @Cyb3rWard0g on GitHub Sponsors

Hello! my name is Roberto Rodriguez A.K.A @Cyb3rWard0g and I am honored to be part of the GitHub Sponsors program! One…

github.com

Thank you so much in advance 🙏 😊🍻 and enjoy the rest of your weekend!

References

https://medium.com/threat-hunters-forge/writing-an-interactive-book-over-the-threat-hunter-playbook-with-the-help-of-the-jupyter-book-3ff37a3123c7

https://nbformat.readthedocs.io/en/latest/api.html

https://mordordatasets.com/notebooks/small/windows/03_persistence/SD-190518184306.html

https://github.com/Neo23x0/sigma/blob/master/tools/config/helk.yml

https://github.com/Cyb3rWard0g/HELK