Deploying Detections at Scale — Part 0x01 use-case format and automated validation

Gijs Hollestelle
FalconForce
Published in
9 min readMar 13, 2023

At FalconForce, we have built a large repository of over 350 detection queries. A question we get asked a lot is: “how do you manage and deploy such a collection at scale?”

Because we want to support the infosec community we have decided to release our internally developed file format to store these detections, as well as our automated tools that can be used to manage and validate a repository of detections. If you are a frequent reader of our blogs, it might not come as a surprise that we focus on the Microsoft Sentinel and Microsoft 365 Defender platforms.

TL;DR

We are releasing a number of things as part of this blog post:

At a later stage we plan to release additional parts of our tooling that allow customization of use-cases for specific environments, managing allow-lists and automated deployment.

If you are interested in fully automating your use-case deployment or would like to know more about possibilities of licensing our repository with over 350 readily available use-cases for Sentinel and Microsoft 365 Defender, feel free to get in touch or have a look at our data sheet!

Building a repository of use-cases

The first step of getting to a fully automated deployment method is to gather all use-cases in a version-controlled central location. These use-cases should be stored in a unified and machine-readable format.

An example of a format that could be used for this is the ‘analytic rule template’ format that Microsoft uses in the Azure Sentinel repository on GitHub.

This format is an excellent start, but we identified a number of shortcomings with it:

  • It is tailored to Sentinel and does not cover Microsoft 365 Defender.
  • Use-case IDs are random GUIDs.
  • It does not allow for full customization and dynamic allow-listing. Note this will be the topic of a future blog post.
  • There is no link between ATT&CK techniques and tactics.
  • It is missing a change-log that records when the use-case was modified and why.
  • Limited meta-data is available, for example reference URLs, expected false positive rate, tags and response actions are not part of the template.

These shortcomings led us to build our own use-case format that allows creating use-cases for both Sentinel and Microsoft 365 Defender in a unified format.

We have released the documentation, as well as a specification, for this format in json-schema format. We also converted a number of previously released FalconFriday detection queries to this format to provide examples of what the format looks like.

A nice feature of json-schema is that it can be used with an editor like VS Code to have inline validation and auto-completion of the template. For example, via the Red Hat YAML extension.

Automated validation of schema and auto completion based on json-schema in VS Code.

Implementing automated validity testing

Once all detection content is available in a unified format, we can set up CI/CD pipelines that validate the correctness of the files and can perform automated deployment.

The automated deployment will be covered in a future blog post. In this post we will focus on the validity testing.

We perform validity testing in 5 stages:

  1. Validate that the usecase.yml file is correct according to the json-schema specification. This schema describes the formats of the various elements in the YAML file, as well as their type and whether they are mandatory or optional.
  2. Perform additional validations in Python that are hard to express in json-schema. An example of this is that the change log entries are required to be sorted with the newest one on top.
  3. Perform optional validations against query best-practices. These validations are optional in the sense that they will generate a warning rather than an error. Also, these validations can be silenced. An example of such a validation is that the FileProfile function in Microsoft 365 Defender is always used with a second argument specifying the maximum number of hashes.
  4. Validation of the actual query for syntax errors and validation against the Sentinel and Microsoft 365 Defender schemas. We have developed a custom KQL query analyzer REST server can analyze the query and provide meta-data, such as which tables are referenced and whether there are syntax errors present.
  5. Validation of entity mapping and custom details against the query output. This validation uses the same KQL query analyzer REST server mentioned above to verify that the entity mappings and custom details that are specified are actually present in the output of the query.

Example usage

Let’s look at an example use-case from one of our previous FalconFriday blogs“AWS User Accessing Excessive Secrets”.

We have already converted this use-case into the new format and made it available in the FalconForge repository.

For demonstration purposes, we will pretend that we made a few small mistakes when making the YAML fil,e so we can see the various validations in action. We have added the following errors:

  • We entered an invalid value for the severity: L.
  • We referenced ATT&CK technique T1210, which is a valid technique, but cannot be combined with tactic TA0009.
  • The change log is not in the correct order with the latest change at the top.
  • In the entity_mapping section there is a a reference to userAccount column (with lowercase u) which is not correct, since it should be UserAccount.

This use-case with the above-mentioned errors looks as follows in the usecase.yml template:

The first error is already highlighted by VS Code since it fails to comply with the json-schema.

We can now run the verify.py tool to check for any errors:

The first error is reported by the verify tool. Once we fix it, we can re-run the validation and we see the next error being reported:

Once we fix that, we get the next error:

The final error we introduced requires in-depth analysis of the query and the output columns produced. For this we need an instance of the KQL Query analyzer REST service to be running and provide its URL so that the verify.py script can use it:

Automatically running validations using a CI Pipeline

We have these configured to run in an automated manner each time a Pull Request is created in the repository that stores the collection of use-cases.

We provide an example pipeline for Azure DevOps that can be used by anyone to perform automated CI testing for use-case repositories. The pipeline also runs a local copy of the KQL Query Analyzer REST service for performance reasons. This example can be adapted to run with other CI/CD platforms, such as GitLab CI/CD pipelines or GitHub actions.

When implemented, this will perform a fully automated validation of all use-case files in the repository whenever a pull request is created.

Example output from a CI pipeline that verifies the syntax when a pull request is created.

In case errors occur, these can be seen in the pull request:

Query syntax validation

Implementing validations on the actual KQL queries is an important step in validating the use-case file. These validations are not trivial since they require the full parsing of the query, as well as knowledge of the data schemas that exist in Sentinel and Microsoft 365 Defender.

Luckily, Microsoft has released an excellent KQL parser library that is available as open source on GitHub. This provides all the functionality to parse a KQL query, analyze which tables are used by it and what output columns it provides. However, it does not provide any knowledge of the schemas used in Sentinel and Microsoft 365 Defender, neither does it support special functions such as FileProfile and _GetWatchList which are heavily used in our use-cases.

To overcome this, we’ve built a KQL query analyzer with additional functionality on top of Microsoft’s KQL parser library. Our KQL query analyzer tool has the following features:

Microsoft’s KQL parser library is only available for .NET and Typescript, while our existing tooling is written in Python. So, we decided to implement the KQL query analyzer in a separate C# project and expose it as a REST server that can be called from the existing Python code. The Python code uses the results obtained from the KQL query analyzer to perform checks, such as validating the existence of all output columns that are specified in the entity mapping section.

We are releasing this KQL analyzer REST server as part of this blog post. We are also hosting a public instance of this REST server running as an Azure Function for you to play around with at https://kql.falconforce.blue/api/kqlanalyzer. We will keep this available as long as it is not abused.

NOTE: We are not recording any query details from the requests.

Example usage of the KQL analyzer REST service

Let’s assume you have a Sentinel query like this:

This can be analyzed using the KQL analyzer by putting it in a JSON file and feeding it to the KQL analyzer REST service:

A parsing error is returned, since the SPN column is not known to exist in the HoneyPotAccounts watchlist. We can provide the analyzer with information about the existence of this watchlist and the columns it contains by specifying this in the local_data section of the request.

For example, if the watchlist contains a single column called SPN of type string this can be specified as follows:

With this additional information the query can successfully validated and the information returned can be used to perform further analysis on the query.

Full documentation of all available functionality is available in the KQLAnalyzer repository on GitHub.

We hope that our tools are useful in helping organizations to better manage and deploy detection use-cases at scale!

--

--