Archive Microsoft 365 Defender logs

Welcome back!

I’ve recently wrote an article about Microsoft Sentinel Basic and Archive logs and the new custom log ingestion API with Data Collection Endpoint. That was the first part in a series of two.

  • Part 2 | Archiving Microsoft 365 Defender logs [📍you are here ]
    – Downsides and limitations of the integrated M365 data connector
    – Use Logic App to ingest 365 Defender data as custom logs into Basic table

Microsoft 365 Defender data connector

In the previous part I’ve introduced the idea of of archiving 365 Defender logs to Sentinel, why you would want to do this and how you can achieve this.

The shortest way might not always be the best. You might save a few parsecs, but it will also hurt your wallet a lot more.

Preparations

To get you started quickly, I’ve prepared an ARM template deployment which will deploy all necessary Azure resources. And it’s fully automatic! No manual deployments needed at all. Everything will be setup and configured to get started right away. Try it out in your demo/lab environment and take a closer look at how all the intricate details work together as a whole. If you’re satisfied, you can always re-deploy it into production with or without the adjustments you might seem fit.

Manual deployments? Where we’re going we don’t need any of those

Logic App walkthrough

Before we continue with a breakdown of the deployed solution, please make sure the streaming API within Microsoft 365 Defender is configured to stream all events to the newly created storage account.

Fill in the resource ID from the newly created storage account and don’t forget to enable all checkboxes for the Event Types
  • Variables year, month and day are going to represent the blob container structure for each event type and are based on the current date/time minus one day to make sure the Logic App will retrieve yesterdays logs.
  • Both dceUri (Data Collection Endpoint ingestion URI) and dcrPrefix (Data Collection Rule prefix) are here to simplify the automated deployment so that the values can be dynamically updated.
  • Then there’s the customTableSuffix which I’ve set to _archive_CL but you might want to chose a different one for your tables.
  • The dataCollectionRules variable is an interesting one because it contains all of the Data Collection Rule names that were deployed, as well as all of the related imutableIds for each respective DCR. Again, dynamically filled in during deployment because the imutableIds are globally unique. And although it might look like an array, it’s datatype is still a string. The next step will parse the list so that we can select an imutableId from it later.

Data Collection Rules

  • The DCR will determine where the data will be stored once you’ll send logs to its imutableId which in term is part of the URI you’ll post data to. More on this later…
  • It is linked to both a table and a Data Collection Endpoint as demonstrated by Microsoft in their tutorial. I learned the hard way that you can’t go on and create table after table and use the same DCR for all of them. There’s a limit of 10 tables (streams) within a DCR, so I figured that I might as well create a separate DCR for every table.
  • It also contains the transformation query used to transform your logs on-the-fly. In our particular instance the transformation query is always the same for all tables and DCRs and quite simple. The log definitions are already fine and for this purpose we also want to keep them in tact and original. The only thing that’s missing is a TimeGenerated column which is mandatory. Data coming from 365 Defender does however contain a column Timestamp , which I could’ve used, but I’ve opted for TimeGenerated to be the ingestion time in the workspace:
source
| extend TimeGenerated = now()

Tables

As of right now my lab environment only had 17 different tables in Microsoft 365 Defender and therefore I also have an equivalent amount of blob containers and Data Collection Rules.

Example of a PT1H.json blob containing more parameters than we’re interested in. TenantId is a reserved name and cannot be used. And we only want to keep all items from ‘properties’.
Example of a cleaned-up and proper sample.json to create the new table with.

Retrieving blobs

So, now we can start connecting to storage account and retrieve al blob container names from the root folder. Next, it will cycle through each of the blob containers.

  • Determine the correct dcrName to lookup the imutableId for in the dataCollectionRules variable.
  • Retrieval of the imutableId from that variable we’ve parsed earlier is done with a Javascript step.

Composing a proper ‘body’

  • “Compose body…” will fix the json structure by concatenating in these required characters so that the results can be parsed.
  • “Parse string…” will make sure only the properties section will be kept in tact. (to remove unwanted columns as shown earlier above) The output is a new object which can be processed item by item (data element).

Splitting the object into separate items, and processing them one-by-one is important to not reach the Data Collection Endpoint body limit of 1 MB

Sending logs to the Data Collection Endpoint

For every item/data element we’ll be POSTing the contents of properties as a Body in the web request:

  • The Javascript result containing the correct imutableId for that specific table based on the originating blob container
  • And of course the correct table name

Authentication

To make sure the Logic App can both authenticate to the Storage Account and the Data Collection Endpoint, a system-assigned Managed Identity is used.

  • Storage Blob Data Contributor is required on the storage account

Some lessons learned

During the creation of this whole solution I went through some extensive trial-and-error. Mostly because of the fact that the new custom logs ingestion approach is still in public preview, and a lot is still undocumented or unclear due to the lack of proper feedback.

  • To “fix” the json structure of the blob contents I made use of concat() and replace(), to replace carriage return and new line characters. When you input this within the graphical designer view, the underlying code gets messed up and additional escape characters ( \ ) are added which leads to undesired results. Go into code view and fix this by removing these:
Replace:
"value": "@{replace(items('...')['...'],'\\r\\n',' ')}"
To:
"value": "@{replace(items('...')['...'],'\r\n',' ')}"
  • The Data Collection Endpoint will not process a Body larger than 1 MB in size.
{
"error": {
"code": "ContentLengthLimitExceeded",
"message": "Maximum allowed content length: 1048576 bytes (1 MB). Provided content length: 3101601 bytes."
}
}
  • The same goes for a proper Body notation. Without the square brackets ([ ] ) for example, the data won’t be processed but you’ll still receive that status 204 on the API.

Conclusion

We’ve reached the end of the line and part #2 of my first multi-part article!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store