New ‘User Agent’ Parsing UDM Schema

7 min readDec 28, 2022

Chronicle SIEM’s UDM schema was recently updated to support native HTTP User Agent extraction capabilities. In this post I’ll explore how to implement and make use of it.

The updated UDM event schema for User Agent strings

Note, the updates can be implemented in place, or in addition, to the prior network.http.user_agent string. And, for reference, how the new UDM parsed User Agent string looks once implemented:

network.http.parsed_user_agent.family = "APPLEWEBKIT"
network.http.parsed_user_agent.sub_family = "AppleWebKit"
network.http.parsed_user_agent.platform = "Windows"
network.http.parsed_user_agent.os = "Windows NT 6.1"
network.http.parsed_user_agent.browser = "Chrome"
network.http.parsed_user_agent.browser_version = "70.0.3538.110"
network.http.parsed_user_agent.browser_engine_version = "537.36"
network.http.parsed_user_agent.annotation.key = "Chrome"
network.http.parsed_user_agent.annotation.value = "Chrome/70.0.3538.110"
network.http.parsed_user_agent.annotation.key = "SafariType"
network.http.parsed_user_agent.annotation.value = "Safari/537.36"
network.http.parsed_user_agent.annotation.key = "OS_NAME"
network.http.parsed_user_agent.annotation.value = "Windows NT"
network.http.parsed_user_agent.annotation.key = "OS_VERSION"
network.http.parsed_user_agent.annotation.value = "6.1"

Implementation

In this section I’ll cover how to use a Parser Extension to add the new UA extraction per log source.

At present the new UDM User Agent schema appears not implemented in default parsers; however, I suspect it will be added as time goes on.

💡 Watch the Chronicle SIEM release notes for more info, and if not already setup, configure your RSS feeder to get updates automatically.

Find the original User Agent field in your log source

I’m going to to experiment with adding the new User Agent (UA) extraction to my load balancer log source, GCP_LOADBALANCER.

The first step is validate what field the UA is stored in the original log, which is easiest achieved via:

UDM Search, e.g., network.http.user_agent != ""
Raw Log Search, e.g., (?:(?i)user_agent|(?i)useragent)

And the excerpt of the original log, the field we’ll create a Parser Extension for is httpRequest.userAgent:

{
  "httpRequest": {
    "userAgent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
    ...
  },
...

🗈 You cannot create a Parser Extension directly from Settings at this time. You will need to search the log source in question, find an event, and view the log via the Event Viewer tab click Parsers.

Extension Method?

Should you use the Parser Extension GUI (Map data fields) or a grok (Write code snippet)?

I personally recommend taking the time to learn grok, aka the Write code snippet approach. Why? More often than not you will, at some stage, come to an extension requirement that is beyond a simple field extraction, and then the time saved using the Map data fields approach is undone by having to refactor it to a grok. From experience, having to refactor several field mappings into a grok to add one field is annoying for the person that lands upon. Getting started using the Mapped data fields is a quick win, and useful for quick implementation of additional field mapping though.

In this case, it requires a Grok extension for multiple reasons, as we’ll see below.

Writing the Code Snippet Extension

Below is the grok extension that extends the GCP_LOADBALANCING default parser to implement the new UDM User Agent schema.

Firstly, #comments.

There isn’t a text field for recording changes or explaining why you’ve implemented them in the Parser Extension UI at this time; however, that’s no problem as we can use comments which should be your baseline of who, what, and when.

💡 I’ve written about comments best practices before, see here

As to the parser itself, like writing any Chronicle parser, always initialize sentinel fields. Chronicle SIEM’s parsing engine doesn’t dynamically instantiate fields which means you must to declare them in advance if you want to perform reliable conditional checks. If you don’t do this, and the field you check doesn’t exist in a given log, and there’s no error handling in the parser, it will generate an error and you’ll have an un-parsed log.

    # initialize sentinel fields
    mutate {
        replace => {
            "httpRequest.userAgent" => ""
            "httpRequest.requestUrl" => ""
        }
    }

Next up is extracting the log format and testing if it succeeded. The on_error values are Boolean, hence the conditional check is a true or false statement. The drop function is technically redundant as it’s not implemented in Parser Extensions, but, as a best best practice I use for writing a Parser, I’m keeping it there as much to show the intended flow.

json {
  on_error => "not_json"
  source => "message"
  array_function => "split_columns"
}

if [not_json] {
  drop{ tag => "TAG_UNSUPPORTED" }
}

Next up, extract the UA. This is a case of converting the required field storing the UA into the appropriate Chronicle format, then its just a simple rename of the field into the appropriate event we’ll be outputting at the end of the Parser.

if [httpRequest][userAgent] != "" {
  mutate {
    convert => {
      "httpRequest.userAgent" => "parseduseragent"
    }
  }  
            
  #Map the converted "user_agent" to the new UDM field "http.parsed_user_agent".
  mutate {
    rename => {
      "httpRequest.userAgent" => "event1.idm.read_only_udm.network.http.parsed_user_agent"
    }   
  }
}

As an User Agent parser extension you may be wondering, why is there a chunk of Parser code dedicated to populating a hostname?

It’s due to the underlying parser having a validation error. What’s validation error? To assign UDM metadata event types, such as NETWORK_CONNECTION, you have to populate required fields (see UDM Usage guide for more detail on required and optional fields) which in this case is require target.hostname be specified. This will block you from submitting a parser extension, and hence why using a Grok as mentioned earlier becomes of benefit as we can address the blocking issue, i.e., assign a target.hostname.

🗈 This should be addressed in a future update where underlying parser errors will not block a parser extension.

if [httpRequest][requestUrl]!= "" {
  grok {
    match => {
      "httpRequest.requestUrl" => ["\/\/(?P<_hostname>.*?)\/"]
    }
    on_error => "_grok_hostname_failed"
  }
  if ![_grok_hostname_failed]  {
    mutate {
      replace => {
        "event1.idm.read_only_udm.target.hostname" => "%{_hostname}"
      }
    }
  }           
}

And for reference, the entire Code Snippet:

# GCP_LOADBALANCING
# owner: @thatsiemguy
# updated: 2022-12-23
# Custom parser extension that:
# 1) extracts User Agent 
# 2) fixed base parser issue with UDM validation

filter { 

    mutate {
        replace => {
            "httpRequest.userAgent" => ""
            "httpRequest.requestUrl" => ""
        }
    }

    json {
        on_error => "not_json"
        source => "message"
        array_function => "split_columns"
    }

    if [not_json] {

        drop{
            tag => "TAG_UNSUPPORTED"
        }

    } else {

        if [httpRequest][requestUrl]!= "" {
            mutate {
                replace => {
                    "event1.idm.read_only_udm.target.url" => "%{httpRequest.requestUrl}"
                }
            }
            grok {
                match => {
                    "httpRequest.requestUrl" => ["\/\/(?P<_hostname>.*?)\/"]
                }
                on_error => "_grok_hostname_failed"
            }
            if ![_grok_hostname_failed]  {
                mutate {
                    replace => {
                        "event1.idm.read_only_udm.target.hostname" => "%{_hostname}"
                    }
                }
            }           
        }

        if [httpRequest][userAgent] != "" {
            mutate {
                convert => {
                    "httpRequest.userAgent" => "parseduseragent"
                }
            }  
            
            #Map the converted “user_agent” to the new UDM field “http.parsed_user_agent”.
            mutate {
                rename => {
                    "httpRequest.userAgent" => "event1.idm.read_only_udm.network.http.parsed_user_agent"
                }   
            }
        }

        mutate {
            merge => {
                "@output" => "event1"
            }
        }        

    }

}

How it looks in UDM

And with that Parser Extension successfully applied, wait for some more data to come in, and re-run your RLS or UDM Search.

Here’s an example UDM output:

network.http.parsed_user_agent.family = "APPLEWEBKIT"
network.http.parsed_user_agent.sub_family = "AppleWebKit"
network.http.parsed_user_agent.platform = "Windows"
network.http.parsed_user_agent.os = "Windows NT 6.1"
network.http.parsed_user_agent.browser = "Chrome"
network.http.parsed_user_agent.browser_version = "70.0.3538.110"
network.http.parsed_user_agent.browser_engine_version = "537.36"
network.http.parsed_user_agent.annotation.key = "Chrome"
network.http.parsed_user_agent.annotation.value = "Chrome/70.0.3538.110"
network.http.parsed_user_agent.annotation.key = "SafariType"
network.http.parsed_user_agent.annotation.value = "Safari/537.36"
network.http.parsed_user_agent.annotation.key = "OS_NAME"
network.http.parsed_user_agent.annotation.value = "Windows NT"
network.http.parsed_user_agent.annotation.key = "OS_VERSION"
network.http.parsed_user_agent.annotation.value = "6.1"

Example Chronicle Detection Engine YARA-L Rule

Here’s an example YARA-L rule using the updated User Agent UDM parsing to give you an idea of the sort of thing you can start to do.

It looks for any instance of the new UDM schema being used:

$ua_agent.network.http.parsed_user_agent.browser != ""

And then utilizes a risk_score to look for interesting or anomalous UA strings, such as really short UAs, unexpected long UAs, or empty elements within a UA.

$risk_score = max(
        // should be more than 5 chars
        if ($ua_agent.network.http.parsed_user_agent.browser_version = /^.{0,5}$/, 10) +
        // usually below 12 chars
        if ($ua_agent.network.http.parsed_user_agent.browser_version = /^.{15,}/, 10) +
        // should always be populated
        if ($ua_agent.network.http.parsed_user_agent.browser_version = "", 20) +

Note, there is no count function for a string field in YARA-L, but you can use a Regex like above to emulate effective string length operations.

And the final rule for reference:

rule user_agent_analysis {

  meta:
    author = "thatsiemguy@"
    owner = "infosec@"  
    description = "Detects User Agent based anomalies."
    response = "Use the Outcome variables to determine the Risk Score, and detect anomaly.  Optionally, evaluate if of interest to investigate further based upon Risk Score, target, and frequency."
    severity = "INFOMATIONAL"
    priority = "INFOMATIONAL"

  events:
    $ua_agent.metadata.event_type = "NETWORK_CONNECTION" or
    $ua_agent.metadata.event_type = "NETWORK_HTTP" or
    $ua_agent.metadata.event_type = "USER_RESOURCE_ACCESS"

    $ua_agent.principal.ip = $principal_ip

    $ua_agent.network.http.parsed_user_agent.browser != ""

  match:

    $principal_ip over 10m

  outcome:
    $risk_score = max(
        // should be more than 5 chars
        if ($ua_agent.network.http.parsed_user_agent.browser_version = /^.{0,5}$/, 10) +
        // usually below 12 chars
        if ($ua_agent.network.http.parsed_user_agent.browser_version = /^.{15,}/, 10) +
        // should always be populated
        if ($ua_agent.network.http.parsed_user_agent.browser_version = "", 20) +        
        // base severity scoring
        if ($ua_agent.security_result.severity = "UNKNOWN_SEVERITY", 0) +
        if ($ua_agent.security_result.severity = "LOW", 25) +
        if ($ua_agent.security_result.severity = "MEDIUM", 50) +
        if ($ua_agent.security_result.severity = "HIGH", 75) +
        if ($ua_agent.security_result.severity = "CRITICAL", 100)
    )
    $severity = array_distinct($ua_agent.security_result.severity)
    $short_ua_browser_version = array_distinct(if($ua_agent.network.http.parsed_user_agent.browser_version = /^.{0,5}$/, "true"))
    $long_ua_browser_version = array_distinct(if($ua_agent.network.http.parsed_user_agent.browser_version = /^.{15,}/, "true"))
    $empty_ua_browser_engine_version = array_distinct(if($ua_agent.network.http.parsed_user_agent.browser_version = "", "true"))

  condition:
    $ua_agent and $risk_score > 50

  options:
    allow_zero_values = true

}

Note, this isn’t a production level detection rule, but hopefully a helpful primer to get started.

Using UDM Search

A quick option for finding the new UDM UA schema results is running a UDM search as follows:

network.http.parsed_user_agent.browser != ""

Note, at the time of writing Chronicle Dashboards have not been updated to include the new schema, but it will no doubt be pushed out in the next Looker schema update cycle.

Taking this all forward

Next steps could potentially involve an action plan for implementation:

Identify all log sources you’d want to apply the new UA extraction too
Apply parser extensions as needed

Or…

Wait, it’ll probably be added into the Parsers natively, or perhaps log a support ticket to request it for your favourite log source

Summary

I’ve not delved into the depths of all the interesting Detection Engineering capabilities that can be derived from User Agent strings, but suffice to say its a powerful update to UDM Schema, and will make it easier to either port across Detection Logic from platforms that already extract more granular UA fields, or else to build new Detections in an easier manner.