Aggregate Queries in UDM Search

Chris Martin (@thatsiemguy)
16 min readJun 16, 2024

--

Google SecOps has introduced UDM Stats, a powerful new feature in preview that brings aggregate queries to UDM Search using YL2. This enhancement not only facilitates faster, more efficient investigations but also powers the new Preview Dashboards feature.

Aggregate searches in Google SecOps via UDM Stats

In this post I explore what you can do with the new UDM Stats feature, example queries, and other useful tips about the preview.

YL2

📜 If you want to skip a history section on YARA-L go to the next section

YL2 (pronounced why-ell-two) is the latest version of YARA-L, a powerful language for threat detection in Google Chronicle. Its origins trace back to YARA (Yet Another Ridiculous Acronym), a rule-based language used in VirusTotal for identifying and classifying files based on text or binary patterns. YARA served as the inspiration for YARA-L 1.0, which was adapted for analyzing log data in Chronicle BackStory’s Detection Engine.

In Q4 2020, Chronicle’s Detection Engine underwent a significant refactor, leading to the development of YARA-L 2.0, now known as YL2. This new version introduced substantial improvements in functionality and syntax.

Following this, Chronicle introduced Search v1 and embedded Looker Dashboards in Q4 2021. Search v1 was later refactored into UDM Search in Q2 2022, while embedded Looker Dashboards are being gradually replaced by native Dashboards.

YL2 converging in the Google SecOps platform

These developments initially resulted in three distinct features within Chronicle, each using different languages with some overlapping capabilities. However, by 2024, substantial progress has been made. UDM Search and Native Dashboards now too utilize YL2, streamlining Chronicle’s features and moving closer to a unified language and set of functions.

Though the convergence journey is still in progress, significant strides have been made. UDM Search and Native Dashboards now share largely interchangeable syntax. While the Detection Engine currently uses a different syntax, there is overlap in functionality across all three. The YL2 function library documents these shared functions.

Example of YL2 function that works between Detection Engine and UDM Search

Aggregate Searches in UDM Search

UDM Search now supports aggregate searches, also known as UDM Stats search. This powerful feature enables a “start broad, narrow down” workflow, allowing you to uncover hidden patterns and trends in your security data that might not be visible when looking at individual events.

Previously, achieving this type of analysis required either using embedded Looker Dashboards or exporting Chronicle UDM data to BigQuery for processing. This could be time-consuming and require specialized skills.

Let’s explore how aggregate searches can simplify this process. Say you want to analyze all activity related to the host “jumpbox-01” you could run a UDM Stats search as follows:

$hostname = group($e.principal.hostname,$e.target.hostname)
$hostname = "jumpbox-01"
outcome:
$count = count($e.metadata.id)

The results from the UDM Stats query appear as follows:

| count |
| ----- |
| 662 |
Using UDM Stats to return a Count of activities

If you wanted to get more granular detail broken down by date, you could run a UDM stats query as follows:

$hostname = group($e.principal.hostname,$e.target.hostname)
$hostname = "jumpbox-01"
$date = timestamp.get_date($e.metadata.event_timestamp.seconds)
match:
$date
outcome:
$count = count($e.metadata.id)
$event_list = array_distinct($e.metadata.event_type)
$application = array_distinct(strings.to_upper($e.target.application))
$is_interesting = sum(if($e.security_result.action = "BLOCK",1,0))
order:
$date asc

The results from the UDM Stats query appear as follows:

|       date | count | event_list                                                                                     | application                          | is_interesting |
| ---------- | ----- | ---------------------------------------------------------------------------------------------- | ------------------------------------ | -------------- |
| 2024-06-09 | 32 | PROCESS_LAUNCH | CRON | 0 |
| 2024-06-10 | 86 | PROCESS_LAUNCH | CRON | 0 |
| 2024-06-11 | 86 | PROCESS_LAUNCH | CRON | 0 |
| 2024-06-12 | 117 | PROCESS_LAUNCH, USER_LOGOUT, USER_LOGIN, NETWORK_CONNECTION, STATUS_UPDATE, USER_UNCATEGORIZED | SYSTEMD-LOGIND, SU, SUDO, CRON, SSHD | 2 |
| 2024-06-13 | 111 | USER_LOGIN, STATUS_UPDATE, USER_UNCATEGORIZED, NETWORK_CONNECTION, USER_LOGOUT, PROCESS_LAUNCH | CRON, SYSTEMD-LOGIND, SUDO, SSHD | 0 |
| 2024-06-14 | 86 | PROCESS_LAUNCH | CRON | 0 |
| 2024-06-15 | 86 | PROCESS_LAUNCH | CRON | 0 |
| 2024-06-16 | 58 | PROCESS_LAUNCH | CRON |

While default UDM Search displays each event on its own row, UDM Stats provides aggregated results, allowing you to view summarized data in a more concise format.

Let’s explore what this UDM Stats query does in more detail:

Hostname Grouping:

  • $hostname = group($e.principal.hostname,$e.target.hostname): This line creates a combined set of all hostnames involved in the events, regardless of whether they are the source (principal) or destination (target) of the interaction.
    - in a non-UDM stats search you can use the Grouped fields option, this is not available in UDM Stats search, hence the use of the group field.
  • $hostname = "jumpbox-01": This filters the results to only include events where "jumpbox-01" is either the UDM Principal or Target hostname.

Date Extraction:

  • $date = timestamp.get_date($e.metadata.event_timestamp.seconds): This extracts the date (year, month, day) from each UDM event timestamp

Event Matching:

  • match: $date: This specifies that the query should match and group by all events that have a date within the UDM Stats query timeframe.

Outcome Calculation:

  • $count = count($e.metadata.id): This counts the total number of events matching the criteria. Each unique UDM event is tagged with a unique metadata.id field.
  • $event_list = array_distinct($e.metadata.event_type): This creates a list of unique event types found in the matching events.
  • $application = array_distinct(strings.to_upper($e.target.application)): This creates a list of unique applications accessed or interacted with, converting all application names to uppercase for consistency.
  • $is_interesting = sum(if($e.security_result.action = "BLOCK",1,0)): This checks if any of the matching events have a security action of "BLOCK," indicating a potentially suspicious event. It sums these events, giving you a count of potentially interesting activities.

Ordering:

  • order: $date asc: This sorts the results by date in ascending order, presenting them chronologically.

Pivoting from UDM Stats to UDM Search

Each result row within a UDM Stats query now includes a “Run UDM Query” button, which allows you to easily pivot from the aggregated view in UDM Stats to a detailed timeline of individual events in UDM Search.

Clicking this button automatically generates a UDM Search query that includes the event filter you used in your UDM Stats query, along with any match variables used, for the specific row you clicked on.

$hostname = group( principal.hostname,target.hostname )
$hostname = "jumpbox-01"
$date = timestamp.get_date( metadata.event_timestamp.seconds )
AND timestamp.get_date( metadata.event_timestamp.seconds ) = "2024-06-12"
UDM Search results based upon a Pivot from UDM Stats search

This new pivoting capability significantly enhances the threat hunting workflow within Chronicle. Previously, moving from high-level aggregate data to specific event details required using different tools like BigQuery or Looker, often involving manual query adjustments and data format conversions. Now, you can seamlessly transition between these views within the Chronicle platform, accelerating your investigations and improving efficiency.

UDM Stats Query Structure

The structure of a UDM Stats query is as follows:

  • //filter: (optional)
    - The initial part of a UDM Stats query does not require a header, and any UDM key value pair of function above the match: or outcome: statement is implicitly a filter
  • match: (optional)
    - the match: section acts as a group by creating a unique row per matching placeholder variable
  • outcome:
    - the outcome: sections enables using aggregate YL2 functions, as well as apply conditional logic
  • order: (optional)
    - specify the order of results using a match variable in either ascending or descending order
  • limit: (optional)
    - configure how many rows should be returned in the UDM Stats results

Saved Searches

UDM Stats queries can be saved using the Search Manager, including support for placeholder variables, for use as part of incident response workflows.

Example UDM Stats saved search in Search Manager

Example UDM Stat Searches

Count UDM Events and Distinct Log Types by Date

  • Description: Counts the UDM events and distinct log type per day
//  daily event count and distinct log types
$date = timestamp.get_date(metadata.event_timestamp.seconds,"Etc/UTC")
match:
$date
outcome:
$event_count = count(metadata.id)
$distinct_log_type = count_distinct(metadata.log_type)
order:
$date asc
  • Example results:
Using UDM Stats to count UDM events and distinct log types by Day

Note, you can find a list of supported time zones for the second parameter of the get_date function here. If omitted the default value is GMT.

UDM consistently handles timestamps from all parsed log sources by storing them as either seconds or nanoseconds within the metadata.event_timestamp field. To effectively access and work with these timestamps, displaying them as human-readable dates or precise datetimes, you’ll need to employ UDM’s convenient timestamp functions.

Since I want to track the number of UDM events per day, I first extract the date (in YYYY-MM-DD format) from metadata.event_timestamp.seconds using the timestamp.get_date function and store it in the $date variable.

$date = timestamp.get_date(metadata.event_timestamp.seconds,"Etc/UTC")
match:
$date

To count unique UDM events, I apply the count aggregation to the metadata.id field, which is unique for each event. I also use count_distinct on the metadata.log_type field to determine the number of distinct log types contributing to each day's total.

outcome:
$event_count = count(metadata.id)
$distinct_log_type = count_distinct(metadata.log_type)

The full list of Aggregate functions in UDM Stats is as follows:

max(): outputs the maximum over all possible values. Only works with integer and float.

min(): outputs the minimum over all possible values. Only works with integer and float.

sum(): outputs the sum over all possible values. Only works with integer and float.

count_distinct(): collects all possible values, then outputs the distinct count of possible values.

count(): behaves like count_distinct(), but returns a non-distinct count of possible values.

array_distinct(): collects all possible distinct values, then outputs a list of these values. It will truncate the list of distinct values to 25 random elements. The deduplication to get a distinct list is applied first, then the truncation is applied.

array(): behaves like array_distinct(), but returns a non-distinct list of values. It also truncates the list of values to 25 random elements.

If our goal is to analyze UDM events across the entire dataset, rather than grouping by day, we can simplify the query. By removing the //filter: and match: sections, we effectively lift the date-based grouping. The outcome: section then becomes the focus, allowing us to compute overall statistics like the earliest and latest event timestamps ($first_seen, $last_seen), the total number of events ($event_count), and the count of unique log types ($distinct_log_type).

// Count UDM events and distinct log types
outcome:
$first_seen = timestamp.get_timestamp(min(metadata.event_timestamp.seconds))
$last_seen = timestamp.get_timestamp(max(metadata.event_timestamp.seconds))
$event_count = count(metadata.id)
$distinct_log_type = count_distinct(metadata.log_type)
  • Example results:
Use UDM Stats to return summary activity without using a match: statement.

Calculating EPS, and the By keyword

  • Description: Calculate the average events per second (EPS)
// Calculate average EPS per day per log source
$log_type = $e.metadata.log_type
$date = timestamp.get_date($e.metadata.event_timestamp.seconds)
match:
$date, $log_type
outcome:
$avg_eps = math.round(count($e.metadata.id) / 86400)
order:
$date, $log_type asc
  • Example results:
Calculating average EPS per day, per log source

However, you could use the by keyword within your match statement to automatically group over a time bucket, e.g., $log_type by day to group by day:

// Calculate average EPS per day per log source
$log_type = $e.metadata.log_type
match:
$log_type by day
outcome:
$avg_eps = math.round(count($e.metadata.id) / 86400)
order:
$log_type asc
  • Example results:
Using the by keyword to group by time buckets

Note, this example uses the math.round function to round up or round down a float into an int, e.g., 11.449467593 becomes 11.

The supported time buckets for the by keyword as as follows:

MINUTE or m

HOUR or h

DAY or d

WEEK or w

MONTH or mo

The by keyword is a useful feature when you want to perform more granular analysis, e.g., you could manually create hour and minute based time buckets using the timestamp.get_hour and timestamp.get_minute function as follows:

// Case XYZ
$event_type = $e.metadata.event_type
$log_type = $e.metadata.log_type
$hour_bucket = timestamp.get_hour($e.metadata.event_timestamp.seconds, "UTC")
$minute_bucket = timestamp.get_minute($e.metadata.event_timestamp.seconds, "UTC")
// further filtering logic here ...
match:
$hour_bucket, $minute_bucket, $event_type, $log_type
outcome:
$count = count($e.metadata.id)
$interesting = sum(if($e.target.process.file.security_result.threat_verdict = "MALICIOUS",1,0))
order:
$hour_bucket, $minute_bucket asc

Using the by keyword you can re-write the original query as follows:

//events
$event_type = $e.metadata.event_type
$log_type = $e.metadata.log_type
// further filtering logic here ...
match:
$event_type, $log_type by minute
outcome:
$count = count($e.metadata.id)
$interesting = sum(if($e.target.process.file.security_result.threat_verdict = "MALICIOUS",1,0))
  • Example results:
Example of using the By keyword for granular per minute grouping of events

As of June 16, 2024, the automatically generated TIME_BUCKET column in UDM Stats is not used as part of the criteria for a pivot search. For example, pivoting from a UDM Stats search with a date filter will produce the following UDM Search query:

$event_type = metadata.event_type
$log_type = strings.to_lower( metadata.log_type )
AND metadata.event_type = "PROCESS_LAUNCH"
AND strings.to_lower( metadata.log_type ) = "nix_system"

Notice that the UDM Search query only includes filters for the event type and log type, not the specific time bucket from the UDM Stats results. This means the search will cover the entire time range specified in the original UDM Stats query.

To perform a more granular drill-down, you can manually add filters for the TIME_BUCKET column to the UDM Search query after pivoting. However, an easier alternative is to customize the pivot search directly within UDM Stats. By including filters for specific hours and minutes in the match section, you can ensure these filters are included in the generated UDM Search query:

$event_type = metadata.event_type
$log_type = strings.to_lower( metadata.log_type )
$hour_bucket = timestamp.get_hour( metadata.event_timestamp.seconds,"UTC" )
$minute_bucket = timestamp.get_minute( metadata.event_timestamp.seconds,"UTC" )
AND timestamp.get_hour( metadata.event_timestamp.seconds,"UTC" ) = 0
AND timestamp.get_minute( metadata.event_timestamp.seconds,"UTC" ) = 12
AND metadata.event_type = "STATUS_UPDATE"
AND strings.to_lower( metadata.log_type ) = "nix_system"

Verifying Namespace Usage Across Log Types

Namespaces are primarily used within Google SecOps to manage entity assets with overlapping IP addresses. However, ensuring consistent namespace application during log ingestion can be crucial, as it can impacts role-based access control (RBAC).

The following UDM Stats query can help you audit namespace usage across different log types within your Chronicle instance. It’s a valuable technique for identifying misconfigurations and ensuring that your Namespaces are being applied consistently.

// Audit Namespaces against Log Types
$log_type = $e.metadata.log_type
$e.metadata.base_labels.namespaces != "" //exclude empty values
match:
$log_type
outcome:
$count = count($e.metadata.id)
$is_configured_namespace = max(
if(
$e.metadata.base_labels.namespaces = /TMO|COE/,
1, //True
0 //False
)
)
$namespaces = array_distinct($e.metadata.base_labels.namespaces)
order:
$is_configured_namespace asc
  1. Filter Empty Namespaces: The $e.metadata.base_labels.namespaces != "" filter excludes events with empty namespace values to focus on relevant data.
  2. Group by Log Type: The query groups events by their log type ($log_type).
  3. Count Events: For each log type, the query counts the number of associated events.
  4. Check for Mismatches: The $is_configured_namespace calculation checks if any event within a log type has a namespace that doesn't match the expected set (specified in the regular expression /TMO|COE/). You should replace this with the namespaces relevant to your environment.
  5. Distinct Namespaces: The $namespaces array collects all unique namespaces found within each log type.
  6. Order by Mismatches: The results are ordered by $is_configured_namespace in ascending order, prioritizing log types with mismatched namespaces.
Audit for misconfiguration using conditional logic

📄 I’ve written on the topic of auditing Namespaces before:

Using the Group function to query all top level UDM Objects

UDM structures event data using various top level objects that represent different aspects of an event, e.g., in network traffic analysis UDM might classify the source of a connection as the principal and the destination as the target.

However, there are scenarios where you might be unsure which top-level object applies or want to analyze both sides of an interaction simultaneously.

In UDM search you can use grouped fields, but these are not available at the time of writing in UDM Stats search.

This is where the group function comes in handy. It allows you to aggregate values across different top-level objects into a single placeholder variable, e.g., if you want to track all IP addresses involved in network communication, regardless of whether they are the source or destination, you can write a UDM Stats search like follows:

// Uses Group function to find asset
$private_ip = group(principal.ip, intermediary.ip, observer.ip, src.ip, target.ip)
(
net.ip_in_range_cidr($private_ip, "172.16.0.0/12")
or net.ip_in_range_cidr($private_ip, "10.0.0.0/8")
or net.ip_in_range_cidr($private_ip, "192.168.0.0/24")
)
match:
$private_ip
outcome:
$count = count(metadata.id)
$first_seen = timestamp.get_timestamp(min(metadata.event_timestamp.seconds))
$last_seen = timestamp.get_timestamp(max(metadata.event_timestamp.seconds))
order:
$count desc
limit:
1000
  • Example results
Using the group function to summarize values across multiple top level UDM objects

Understand normal, and find abnormal using Array Length

In security analysis, establishing a baseline of normal behavior is crucial for identifying anomalies that might indicate a threat. UDM schema’s use of arrays, which can capture unbounded responses like DNS lookups, provides a rich source of information for understanding this normal behavior and then detecting deviations from it

Exploring DNS Log Sources

  • Description: Analyze the DNS data generated within your systems to assess the quantity and diversity of DNS activity.
$e.metadata.event_type = "NETWORK_DNS"
$log_type = strings.to_upper($e.metadata.log_type)
match:
$log_type
outcome:
$questions_total = count($e.network.dns.questions.name)
$distinct_questions = count_distinct($e.network.dns.questions.name)
$question_class = count_distinct($e.network.dns.questions.class)
$question_type = count_distinct($e.network.dns.questions.type)
$answer_class = count_distinct($e.network.dns.answers.class)
$answer_type = count_distinct($e.network.dns.answers.type)
limit:
10
  • Example results:
Explore DNS log sources withing your environment

Explore Virus Total File Metadata Enrichment

As part of the Google SecOps Enterprise+ feature, VT File Metadata Enrichment is automatically added to all observed SHA256 Hashes.

  • Description: Analyze VT File Metadata Enrichment in your organization to find normal, or abnormal

$e.target.process.file.tags != ""
$target_process = $e.target.process.file.full_path
$signers_name = $e.target.process.file.signature_info.sigcheck.signers.name
match:
$target_process, $signers_name
outcome:
$pe_import_functions = max(arrays.length($e.target.process.file.pe_file.imports.functions))
$pe_import_libraries = max(arrays.length($e.target.process.file.pe_file.imports.library))
$file_tags = max(arrays.length($e.target.process.file.tags))
$file_names = max(arrays.length($e.target.process.file.names))
  • Example results:

Filtering on Timestamps in UDM Stats

  • Description: Example of filtering on Timestamps in UDM
    - Note, this does not override the datetime picker in the GUI, and rather will only filter results within the time range selected.
// Count UDM events per day
timestamp.get_date($e.metadata.event_timestamp.seconds) >= "2024-06-01"
timestamp.get_date($e.metadata.event_timestamp.seconds) <= "2024-06-10"
$date = timestamp.get_date($e.metadata.event_timestamp.seconds)
match:
$date
outcome:
$count = count($e.metadata.id)
order:
$date asc

Preview Notes

This section includes general observation and notes on using UDM Stats. It is also worth noting that as of July 2024 UDM Stats is in preview, and some of these items will likely be addressed prior to a GA release.

  • Aggregates within Aggregates

If you try to nest Aggregate functions within a UDM Stats search you will see a compilation error:

Aggregates within Aggregates are not supported

In some cases you can work around this, e.g.,

$eps = sum(count($e.metadata.id) / 86400)

can be re-written as follows:

$eps = count($e.metadata.id) / 86400

But in most cases there is no workaround, e.g.,

// Nested aggregate functions are not supported
$date = timestamp.get_date($e.metadata.event_timestamp.seconds)
match:
$date by minute
outcome:
$event_count = count($e.metadata.id)
$eps = math.round(count($e.metadata.id) / 60)
$min_eps = min($eps) // X aggregate on outcome
$max_eps = max(count($e.metadata.id) / 60)) // X aggregate on aggregates
order:
$date asc

This is an underlying SQL error when you try to perform a nested aggregation where one function depends on the results of another function.

At the time of writing as there is no CTE or sub-select functionality within UDM Search so you are going to need to solve for this externally, i.e., using BigQuery, or the Search API with custom code.

  • UDM Stats via API

UDM Stats queries are not supported via the UDM Search API endpoint, but will be in prior to GA.

  • Conditions?

If you are familiar with YARA-L in Detection Engine you may have noticed there are no examples of using a conditions: statement. That’s because at the time of writing it does not exist in UDM stats. This is a roadmap item for future inclusion in UDM stats.

Filtering out blank values

Blank value in UDM Stats can be shown as -- and you can filter these out with a is not empty statement:

$foo = bar and $foo!= "" // don't use != "--"
  • Compilation errors block the UDM Search window

The UDM Search window has a default height of a few lines. However, if compilation error warnings appear, they can span multiple lines and obscure the entry box.

Resizing the UDM Stats window

Fortunately, you can easily resize the UDM Search window. Hover your mouse over the line separating the search box from the results panel, and a drag handle will appear. Click and drag this handle up or down to adjust the window size. Alternatively, you can close the error message section to reveal the UDM Searcb window.

  • What’s the error again?

If you closed an error window and want to see it again you can rearrange the lines in your query (e.g., by moving line 6 to line 7). This can often trigger the error message to appear, helping you identify and fix the issue.

  • ERROR: Search has encountered an error and could not load data. Please try again, and contact support if this error continues.

Since UDM Stats is still in preview, you might encounter situations where invalid queries pass the initial validation but fail during execution. If you face this scenario you can provide feedback to Google as part of the preview. Your feedback will help improve the feature’s validation and error handling capabilities.

  • Reserved Keywords

From testing there are (undocumented?) reserved keywords that will error if used as a placeholder variable, which so far include hour, minute, events. There are probably more.

Summary

UDM Stats is an exciting addition to Google SecOps, offering a powerful way to gain deeper insights from your security data. I’m eager to delve further into its integration with Preview Dashboards in a future post. While UDM Stats is still in its early stages, I encourage you to explore the feature and provide feedback to Google.

--

--