IOC matching in Chronicle SIEM

19 min readMar 15, 2023

In this post I explore different ways within Chronicle SIEM to perform Indicator of Compromise (IOC) matching, be that via Entity Graph, automated IOC Domain matching, Dashboards, BigQuery, APIs, or Reference Lists, and look at pros and cons of each approach.

What was originally meant to be a brief summary on IOC matching in Chronicle SIEM ended up being a rather in-depth analysis of the several different mechanisms available in the platform, so get a cup of coffee and here we go.

You try finding a picture for IOCs, its really hard, but here’s an angry blue monster

Overview

Firstly, to be on the same page — What is an IOC? I’m going to go with the Wiki definition:

Typical IoCs are virus signatures and IP addresses, MD5 hashes of malware files, or URLs or domain names of botnet command and control servers. After IoCs have been identified via a process of incident response and computer forensics, they can be used for early detection of future attack attempts using intrusion detection systems and antivirus software.

Given that definition, there are several ways to perform IOC matching in Chronicle SIEM today, specifically:

Entity Graph via YARA-L Rule Detection
Automated IOC matching (including Dashboards)
UDM Search (UI and API)
Data Lake, aka BigQuery / SQL
Search API (for Hash View, IP View, Domain View, etc…)
Reference Lists via YARA-L Rule Detections

I cover each of these below, but here’s the summary table of the IOC types that can be matched via each method in Chronicle SIEM:

| Supported Types   | EG + RE | Automated | UDM Search | BigQuery | Search API (Views) | RE + Reference Lists |
|-------------------|---------|-----------|------------|----------|--------------------|----------------------|
| IPv4              | ✓       | ✓         | ✓          | ✓        | ✓                  | ✓                    |
| IPv6              | ✓       | ⨉         | ✓          | ✓        | ⨉                  | ✓                    |
| Domain Name       | ✓       | ✓         | ✓          | ✓        | ✓                  | ✓                    |
| URL               | ✓       | ⨉         | ✓          | ✓        | ⨉                  | ✓                    |
| Hash: MD5         | ✓       | ⨉         | ✓          | ✓        | ✓                  | ✓                    |
| Hash: SHA1        | ✓       | ⨉         | ✓          | ✓        | ✓                  | ✓                    |
| Hash: SHA256      | ✓       | ⨉         | ✓          | ✓        | ✓                  | ✓                    |
| Hash: SSDeep      | ✓       | ⨉         | ✓          | ✓        | ⨉                  | ✓                    |
| Hash: VHash       | ✓       | ⨉         | ✓          | ✓        | ⨉                  | ✓                    |
| Email             | ✓       | ⨉         | ✓          | ✓        | ⨉                  | ✓                    |
| CVE               | ✓       | ⨉         | ✓          | ✓        | ⨉                  | ✓                    |
| General UDM Nouns | ✓       | ⨉         | ✓          | ✓        | ⨉                  | ✓                    |

This is mildly useful as a starting point for considering which IOC method(s) to use, depending on the artifacts you are trying to match against.

And a summary of the capabilities of each of the above methods:

| Feature                        | Entity Graph                                       | Automated                                    | UDM Search                                   | BigQuery     | Search API                                   | Rules Eng + Reference List                   |
|--------------------------------|----------------------------------------------------|----------------------------------------------|----------------------------------------------|--------------|----------------------------------------------|----------------------------------------------|
| Single or multiple dimensions? | Multiple                                           | Single                                       | A single multiple                            | Multiple     | Single                                       | Single (Multi-dimension with a hack)                                      |
| Event Lookback Duration        | 48 hours for Live Rule Detections                  | 1 Year, or as long as you have ingested data | 1 Year, or as long as you have ingested data | 6 Months     | 1 Year, or as long as you have ingested data | 48 hours for Live Rule Detections            |
|                                | +/- 5 Days from IOC ingestion for Batch Detections |                                              |                                              |              |                                              | 1 Year, or as long as you have ingested data |
| Matching Mechanism             | YARA-L Rule + EG                                   | Automatic                                    | UDM Search                                   | BigQuery SQL | UI Views                                     | YARA-L + Reference List                      |
|                                | UI or API                                          | UI or API                                    | UI or API                                    | UI or API    | UI or API                                    | UI or API                                    |

To clarify on a couple of points in the above table:

Single or multiple dimensions — e.g., can you match only IP, or can you match IP + Port
Event look-back duration — how far back can you match the specific IOC indicator

Let’s take a look at each method in more detail.

🐉 Here be dragons. This is my testing and while I’ve made every effort to test and re-test things, this is not official product guidance, but just observed results.

Entity Graph (aka Rules Engine)

I’ve written on the topic of Entity Graph a couple of times, see Using Entity Graph as a Multi-dimensional List and Aliasing in Chronicle SIEM, but one of the more versatile approaches for IOC matching in Chronicle SIEM is Entity Graph (herein to save typing, EG).

Via EG you can ingest various types of Entity types, which can all be joined against Event data via a YARA-L Detection rule.

Now, you may be more familiar with the commonly used ASSET or USER types within EG but, for IOC matching, EG provides several other types of interest of use, specifically:

Domain Names
File (Hashes)
IP Addresses
URLs
User (Email Addresses)
and pretty much anything that’s in the UDM Noun model.

| Enum Value         | Enum Number | Description                                                         |
|--------------------|-------------|---------------------------------------------------------------------|
| ASSET              | 1           | An asset, such as workstation, laptop, phone, virtual machine, etc. |
| DOMAIN_NAME        | 5           | A domain.                                                           |
| FILE               | 4           | A file.                                                             |
| GROUP              | 10001       | Group.                                                              |
| IP_ADDRESS         | 3           | An external IP address.                                             |
| MUTEX              | 7           | A mutex.                                                            |
| RESOURCE           | 2           | Resource.                                                           |
| UNKNOWN_ENTITYTYPE | 0           | An unknown event type.                                              |
| URL                | 6           | A url.                                                              |
| USER               | 10000       | User.                                                               |

If you’re using Chronicle SIEM’s inbuilt GCTI feeds this is already in the EG, and is of type GLOBAL_CONTEXT.

If you’re providing your own CTI feed for IOCs, i.e., using a 3rd party or in-house log source, that will be of type ENTITY_CONTEXT, aka user provided context:

| Enum Number             | Description |                                                                                                                     |
|-------------------------|-------------|---------------------------------------------------------------------------------------------------------------------|
| DERIVED_CONTEXT         | 2           | Entities derived from customer data such as prevalence, artifact first/last seen, asset/user first seen stats, etc. |
| ENTITY_CONTEXT          | 1           | Entities ingested from customers (e.g. AD_CONTEXT, DLP_CONTEXT)                                                     |
| GLOBAL_CONTEXT          | 3           | Global contextual entities such as WHOIS, Safe Browsing, etc.                                                       |
| SOURCE_TYPE_UNSPECIFIED | 0           | Default source type                                                                                                 |

🐛 While ENTITY_CONTEXT is in the Enum, do not ever label your data as ENTITY_CONTEXT as this results in it not being loaded into the Entity Graph.

Using EG SafeBrowsing, aka Global Context

Below is an example YARA-L Detection rule, that uses Chronicle SIEM’s inbuilt GCTI SafeBrowsing Entity Feed, to match any Process Launch activity against a high confidence scored malicious binary.

Notice the source_type in the $g event variable, GLOBAL_CONTEXT. This type of EG feed is created and maintained by Chronicle SIEM.

rule google_safebrowsing_process_launch {
  meta:
    author      = "thatsiemguy@"
    owner       = "infosec@"
    description = "Detects Process Launch events against Critical or High severity Google's SafeBrowsing database.  Optionally apply File Prevalence to detect uncommon or never seen before detections."
    response    = "Not expected to alert.  Evaluate the $risk_score to see if binary is unsigned or running from an unusual location. Investigate the Hash using VT Enhance widget, and escalate case in Chronicle SOAR."
    severity    = "MEDIUM"
    priority    = "HIGH"

  events:
    $e.metadata.event_type = "PROCESS_LAUNCH"
    $e.target.process.file.sha256 = $hash
    $e.principal.hostname = $host
    $e.principal.process.file.full_path = $file 

    $g.graph.metadata.entity_type = "FILE"
    $g.graph.entity.file.sha256 = $hash
    $g.graph.metadata.product_name = "Google Safe Browsing"
    $g.graph.metadata.source_type = "GLOBAL_CONTEXT"
   
    $g.graph.metadata.threat.severity = "CRITICAL" or
        $g.graph.metadata.threat.severity = "HIGH"

  match:
    $hash, $host, $file over 10m

  outcome:
    $risk_score = max(
        if ($g.graph.metadata.threat.severity = "HIGH", 20) +        
        if ($g.graph.metadata.threat.severity = "CRITICAL", 30) +
        // malicious software is higher risk than unwanted 
        if ($g.graph.metadata.threat.category = "SOFTWARE_MALICIOUS", 10) + 
        // raise risk score for unsigned images
        if ($e.target.resource.attribute.labels["Signed"] = "false", 20) +
        // raise risk score for uncommon paths
        if ($e.principal.process.file.full_path = /Downloads|Temp|Tmp/ nocase, 20 )
    )

condition:
    $e and $g
}

Running the above rule, and viewing the resultant EG output, we can see an example of a Safe Browsing EG Global Context entry:

An example GCTI Safe Browsing Entity Graph Global Context entry.

Other examples of Chronicle SIEM in-built Global EG context include:

GCTI Benign Binaries
Tor Exit Nodes
Virus Total

How do I provide my own IOCs to EG? (aka Entity Context)

The first approach here would be to use one of Chronicle SIEM’s out of the box IOC integrations. A non-exhaustive list of EG enabled IOC sources includes:

Anomali
Recorded Future
Threat Connect
Mandiant
STIX
Digital Shadows
Emerging Threats Pro
and more

However, given I don’t have examples of these to hand I can use, I’m going to use another approach, using Chronicle SIEM’s Ingestion API to submit IOCs directly.

⚠️ For the purposes of testing I’m using the log type CATCH_ALL. Why does the UDM Entity API require a log type? Who knows 🤷 but it does, and for development & testing it’s a useful ingestion label to use; however, don’t make the mistake of leaving some production critical log source as Catch All!

❗An important thing to know about Entity Graph that I’ve not found documented (at time of writing), an EG entry is only valid for +/- 5 days from ingestion. This is important for reasons we’ll see later on, but the TL;DR is you must re-ingest context data at least every 5 days in order to match recent events.

I’ve seen folks assume the interval start and end time of an EG context record is the key time range criteria used for matching, and while this is also factored in (unlike automated IOC matching), it is superseded by the undocumented +/- 5 days rules mentioned above.

What does that mean in practicality? Two things:

You must re-ingest Context data at least once every five days in order for it to keep matching going forward, for IOCs and User or Asset context too
EG is best suited for IOC matching within a -/+ 5 day time period from ingestion of the entity record. Useful for ongoing campaigns, but not usable if you require historical IOC matching, i.e., an EG entry ingested today with an interval for the last two years can’t be used in a RetroHunt, it will not match.
- here’s a visual example of this, an IOC record ingested into Entity Graph on the 23rd generates matches for log data from the prior 5 days, and the following 5 days (which spills over the a 6th day), but beyond and before that no matches will be returned!

Entity Graph records are valid for +/- 5 days

⚠ ️Update: July 24, if you include a metadata.threat field in your Entity then the +/- 5 day expiration no longer applies! This in effect makes for an undocumented Entity Graph sub-type of IOC, under an existing Entity Graph type. See https://medium.com/@thatsiemguy/expiring-iocs-in-entity-graph-373010554091 for more info.

Let’s examine some examples of specific IOC type you can ingest into EG for IOC matching.

IP Addresses

One of the most voluminous types of IOC, IP Address. Here’s an example IP v4 IOC represented in EG (not a malicious address, just for testing):

{
 "log_type": "CATCH_ALL",
 "entities": [{
  "metadata": {
   "collected_timestamp": "2023-02-27T22:37:01.653371Z",
   "interval": {
    "start_time": "2023-01-01T00:00:00Z",
    "end_time": "2024-01-01T00:00:00Z"
   },
   "entity_type": "IP_ADDRESS",
   "vendor_name": "ACME",
   "product_name": "TIP",
   "threat": {
    "category_details": "C2",
    "threat_id": "ceb6f24c-6f3e-4575-bc8f-9f66a164901a",
    "threat_feed_name": "ACME-IOC-IP-C2"
   }
  },
  "entity": {
   "ip": ["40.79.150.120"]
  }
 }]
}

Where EG gets really powerful is its multi-dimensional matching capabilities (like a CSV), i.e., match on an IP and Port together.

Note, the only difference in the below example from the above is expanding the EG record to include the port and IP. This is useful as matching on an IP alone can often generate (even more) false positives, e.g., an IP match against NTP isn’t the same as against a high range ephemeral port.

{
 "log_type": "CATCH_ALL",
 "entities": [{
  "metadata": {
   "collected_timestamp": "2023-02-27T22:04:34.267566Z",
   "interval": {
    "start_time": "2023-01-01T00:00:00Z",
    "end_time": "2024-01-01T00:00:00Z"
   },
   "entity_type": "IP_ADDRESS",
   "vendor_name": "TIP",
   "product_name": "ACME",
   "threat": {
    "category_details": "C2",
    "threat_id": "8cc21aaa-f360-4fb0-8529-b5d2f9605502",
    "threat_feed_name": "ACME-IOC-IP-PORT-C2"
   }
  },
  "entity": {
   "ip": ["40.79.150.121"],
   "port": "443"
  }
 }]
}

IPv6 is also supported, kind of; however, this requires consideration as to how your event data parses IPv6 data, e.g., is it shortened or not.

Chronicle SIEM does not automatically shorten or expand IPv6 so you need to evaluate this and make sure it matches your IOC data.

{
 "log_type": "CATCH_ALL",
 "entities": [{
  "metadata": {
   "collected_timestamp": "2023-02-27T21:56:01.937081Z",
   "interval": {
    "start_time": "2023-01-01T00:00:00Z",
    "end_time": "2024-01-01T00:00:00Z"
   },
   "entity_type": "IP_ADDRESS",
   "vendor_name": "TIP",
   "product_name": "ACME",
   "threat": {
    "category_details": "C2",
    "threat_id": "619dc295-82f4-4c7f-8d31-237e98d98a05",
    "threat_feed_name": "ACME-IOC-IPv6-C2"
   }
  },
  "entity": {
   "ip": ["0000:0000:0000:0000:0000:ffff:12dc:a722"]
  }
 }]
}

This EG context entry will successfully match un-shortened IPv6 event data, but would not match the shortened version from testing.

Domain Names

Domain Name matches are straight forward, and a Domain Name entity records (which is of type Hostname in UDM terms) in EG can match against UDM objects and Nounds including <udm object>.hostname, network.dns.question.name, or network.dns.question.answer as parsed from log data into UDM's event model.

{
 "log_type": "CATCH_ALL",
 "entities": [{
  "metadata": {
   "collected_timestamp": "2023-02-27T20:17:45.328295Z",
   "interval": {
    "start_time": "2023-01-01T00:00:00Z",
    "end_time": "2024-01-01T00:00:00Z"
   },
   "entity_type": "DOMAIN_NAME",
   "vendor_name": "TIP",
   "product_name": "ACME",
   "threat": {
    "category_details": "C2",
    "threat_id": "fdb7585c-34a2-474d-acf5-cc847834ee3d",
    "threat_feed_name": "ACME-IOC-IP-DOMAIN"
   }
  },
  "entity": {
   "hostname": "www.example.com"
  }
 }]
}

🤷 Despite there being a UDM path of domain and dns_domain neither of these are used for event data, rather they used for enrichment from other sources, e.g., WHOIS.

File (Hashes)

Files is implemented as Hashes, e.g., MD5, SHA1, or SHA256.

A consideration here would be that you will need to evaluate the Hash generating log sources available in your environment against the CTI source used to populate EG.

Some EDR solutions generate a MD5, SHA1, and SHA256, which are all parsed into UDM, others are configurable (e.g., Sysmon), and others only log a single type of Hash.

Here’s a neat example of an EG FILE record with all available hash types included:

{
 "log_type": "CATCH_ALL",
 "entities": [{
  "metadata": {
   "collected_timestamp": "2023-02-28T13:52:38.780150Z",
   "interval": {
    "start_time": "2023-01-01T00:00:00Z",
    "end_time": "2024-01-01T00:00:00Z"
   },
   "entity_type": "FILE",
   "vendor_name": "ACME",
   "product_name": "TIP",
   "threat": {
    "category_details": "C2",
    "threat_id": "2d15c843-f100-4858-997d-06dc7617e565",
    "threat_feed_name": "ACME-IOC-FILE-HASH"
   }
  },
  "entity": {
   "file": {
    "md5": "9459b00505c199a28911d72b64c90280",
    "sha1": "d14f619fb719ecaa8fe26706bd508423c09353df",
    "sha256": "fb1ea29937052e047467d73f9b42a77d96d3302c71290f6e0531b0c4489504b2",
    "vhash": "027066655d5d15541az28!z",
    "ssdeep": "98304:warN+M2Zd5cbNClw3dNyvE1pBM1sKI8ome5TPMf:w7pDlwXp1pBM1sKymkMf"
   }
  }
 }]
}

Note, to match against vhash or ssdeep you’d need to be using the Chronicle Virus Total preview, which also requires you be a VT Enterprise or VT Duet license holder.

URLs

URL IOCs enable you to match your EDR or Proxy log data against URLs; however, successful URL matching requires verifying your CTI feed against your event data as there is high entropy in a URL, e.g., if your event data looks as follows hxxps://www.example.com/login, and your CTI IOC feed is in the format example.com/login, then your YARA-L detection won’t match successfully. There is no URL shortening or standardization at present from testing.

The precursor steps involved for successful URL IOC matching is therefore:

Identify your log sources that generate a URL, generally stored in target.url
Identify your CTI sources that generate URL IOCs, and verify the format of the IOC

Then, the next step is to make sure they match, which can be achieved via Chronicle SIEMs Parser Management feature, e.g., regex extract out the protocol, the domain from the URI, etc…

{
 "log_type": "CATCH_ALL",
 "entities": [{
  "metadata": {
   "collected_timestamp": "2023-02-27T21:11:32.479111Z",
   "interval": {
    "start_time": "2023-01-01T00:00:00Z",
    "end_time": "2024-01-01T00:00:00Z"
   },
   "entity_type": "URL",
   "vendor_name": "ACME",
   "product_name": "TIP",
   "threat": {
    "category_details": "C2",
    "threat_id": "b920f8bf-c4c4-464d-8749-0d3d4c9e0ba1",
    "threat_feed_name": "ACME-IOC-URL"
   }
  },
  "entity": {
   "url": "tinyurl.com/suppportteam65948604"
  }
 }]
}

An example URL IOC in Entity Graph:

  events:
    $event.metadata.event_type = "NETWORK_HTTP"
    $event.metadata.vendor_name = "ACME"
    $event.metadata.product_name = "PROXY"
    $event.principal.ip = $principalIp
    $event.target.url = $url

    $ioc.graph.metadata.entity_type = "URL"
    $ioc.graph.entity.url = $url

  match:
    $principalIp over 1m

  outcome:
    $risk_score = max(0)

  condition:
    $event and $ioc

And given two example URLs below, this will match a, but not b

a) tinyurl.com/suppportteam65948604
b) hxxps://tinyurl.com/suppportteam65948604

This can be rather fragile, e.g., urlencoding in the logs or the addition or a port in the URL could prevent the detection from firing.

And more

You may be picking up a theme here, pretty much any UDM Noun can be used for an EG entry, and be used for IOC matching, e.g., Registry, CVE, Email Addresses, Mutex, and so on.

Automated Historical IOC Matching

The original method for matching IOCs in Chronicle is focused on i) Domains, and ii) IPs (v4).

Chronicle SIEM ingests an IOC, and automatically check against indexed IPs and Domains for the last year, or however long you have indexed data, and if there’s a match notifies you the matching Asset(s), including the first and last time observed. Pretty neat and powerful.

I covered the log sources above earlier than can be used for IOC matching, but Chronicle SIEM pretty much has integrations with all the major CTI providers, as well you can bring in your own custom IOCs too.

If you need to create a custom IOC parser there’s a few options:

Create a new CBN of type IOC (the older SDM for CBNs is not publicly documented afaict)
Use the CSV_CUSTOM_IOC ingestion label with a CSV formatted as follows (also, not an official integration):

# "category","value", "score", "severity"
suspicious_url,domain.com,86
mal_ip,1.2.3.4,17

3. Use EG for type Domain or IP and it’ll automatically use automated IOC matching (caveat, this may not be an official feature as I can’t see it documented, but from testing it works)

Entity Graph Domain Name IOCs are automatically converted into historical IOC matches.

Earlier on I ingested an EG Domain Name IOC; however, if I look at my IOC Domain Matches tab I notice a match. How did that happen?

It’s because when you ingest an event into EG type of Domain Name it gets automatically converted into an IOC Domain Match.

EG Domain Name records are automatically matched as IOCs

Pretty neat, and gives you the benefit of two ways of matching, but the consideration that you may now have two types of alert to manage (but for example, Chronicle SOAR can automatically group the two related alerts, potentially).

Viewing IP Matches via Looker Dashboards.

The Chronicle UI does not show IP matches in the IOC Domain Matches tab. so if you wish to view IP matches in the UI (and Domain Matches) using a Looker Dashboard is an option.

The example Dashboard widget below shows all IP and Domain Matches, including the number of matches, and the age of the IOC, i.e., was it ingested before an alert, or more interestingly after an alert (as automated IOC matching runs historically).

IOC alerts are written in batch to BigQuery and not streamed, so there is some latency between detection and showing up in a Dashboard.

IOC matching via Chronicle Data Lake (BigQuery)

Given that IP address IOCs are not shown in the IOC Domain Matches tab (it does clearly say its for Domains after all) that makes Chronicle Data Lake a good choice for evaluating IOC matches too (which is used by the above Looker Dashboards).

The below SQL statement enables you to query for one, or more, IP addresses and return any assets with a first and last observed date.

-- Add your IP IOC Matches here
DECLARE __IP_IOCS__ ARRAY <STRING>;
SET __IP_IOCS__ = ['40.79.150.120'];
------
SELECT
  MIN(DATE(TIMESTAMP_SECONDS(CAST(day_bucket_seconds AS INT64)), 'UTC')) AS first_observed,
  MAX(DATE(TIMESTAMP_SECONDS(CAST(day_bucket_seconds AS INT64)), 'UTC')) AS last_observed,
  COUNT(4) AS hits,
  ioc_value,
  feed_log_type,
  is_global,
  CONCAT(COALESCE(asset.namespace, "untagged"),":",COALESCE(asset.hostname, asset.asset_ip_address, asset.mac, "-")) AS asset
FROM
  `datalake.ioc_matches`
WHERE
  ioc_value IN UNNEST(__IP_IOCS__)
  AND ioc_type = "IOC_TYPE_IP"
GROUP BY
  4,
  5,
  6,
  7

and the (truncated) results:

| Row |  first_observed |  last_observed |  hits |  ioc_value    |  feed_log_type |  is_global |  asset                  |   |
|-----|-----------------|----------------|-------|---------------|----------------|------------|-------------------------|---|
| 1   | 2023-02-23      | 2023-03-05     | 64    | 40.79.150.120 | CATCH_ALL      | false      | untagged:192.168.12.16  |   |
| 2   | 2023-02-27      | 2023-03-05     | 29    | 40.79.150.120 | CATCH_ALL      | false      | untagged:192.168.12.129 |   |
|     |                 |                |       |               |                |            |                         |   |

Things to note:

the IOC table in Chronicle Data Lake is not partitioned and does not prune old data from observation, so you’ll be able to match IOCs for as long as you have active data
the IOC table uses a funky date format, hence the extra hurdles to format it as a valid timestamp

Matching multiple IOC indicators

Taking the concept from above forward, you can mimic a CSV in BigQuery SQL with a String Array. This enables you to create a multi-value list, e.g., for IP + Port.

DECLARE
  __IOC_MULTIPLE_ATTRIBUTES__ ARRAY <STRING>;
SET
  __IOC_MULTIPLE_ATTRIBUTES__ = ['1.2.3.4,80',
  '1.2.3.4,443',
  '173.194.69.95,443'];
SELECT
  MIN(TIMESTAMP_SECONDS(metadata.event_timestamp.seconds)) AS first_seen,
  MAX(TIMESTAMP_SECONDS(metadata.event_timestamp.seconds)) AS last_seen,  
  target_ip,
  target.port
FROM
  `datalake.events`,
  UNNEST(target.ip) target_ip
WHERE
  DATE(hour_time_bucket) BETWEEN DATE_SUB(CURRENT_DATE, INTERVAL 60 DAY)
  AND DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)
  AND target_ip != ""
  AND target.port > 0
  AND CONCAT(target_ip,",",CAST(target.port AS STRING)) IN UNNEST(__IOC_MULTIPLE_ATTRIBUTES__)
GROUP BY 3,4

📝 This is a little fiddle, and arguable the above SQL statement can be made more resilient; however, what it does effectively do is enable you to find IOC matches for any duration within the last six months (as remember, the Chronicle Data Lake only retains six months of data by default).

What happens if I need go back further than six months?

At present the best option I have seen is to run daily queries for the required data and persist the results into your own BigQuery instance, then re-target you scheduled IOC SQL statements against the in-house version of the dataset.

UDM Search

UDM Search supports Chronicle References Lists, so you can perform a manual search with Reference List for a single dimension. This makes UDM Search viable for Hashes and Domains, but if you require second dimension, e.g., port, then you can only perform a search for a single indicator,

target.ip = “173.194.69.95” AND target.port = 443

UDM Search does not support OR logic for multiple distinct groups, i.e., you can’t run a search as follows:

( target.ip = “173.194.69.95” AND target.port = 443 ) OR ( target.ip = “173.194.69.96” AND target.port = 8443)

UDM Search does include an API, which opens up the opportunity for programmatic integrations; however, the default quota is 1 QPS, which means it’s not really usable for IOC matching in practicality.

Search API

The Chronicle Search API is the one of the original API endpoints in the SIEM, and can be used effectively for IOC matching for specific artifacts, specifically:

Hashes
IPs
Domain Names

and from these artifacts return matching Assets, with Assets being

Hostname
IPs
MAC Address
Product ID (a unique identifier for an Asset)

The Search API is worthy of its own post, so I’m not going into depths here, but the gist being you can return IOCs against Assets, and with another API call return all events around that interval.


{'instance': 'THATSIEMGUY',
  'namespace': 'internal_altostrat_com',
  'hostname': 'server.internal.altostrat.com',
  'hash256': '2b105fb153b1bcd619b95028612b3a93c60b953eef6837d3bb0099e4207aaf6b',
  'first_seen_time': '2022-09-16T00:56:58.801Z',
  'last_seen_time': '2023-02-02T20:55:54.500Z'},

If your IOC matching is just for the types Hash, IP, or Domain Name, and you’re ok to build something using the API, then this is an extremely effective way of IOC matching. From tests searching over 1 year across multiple instances completed under 10 seconds.

Rules Engines + Reference Lists

Last but not least, you can use Reference Lists with Detection Rules.

There’s three types of lists available:

String
Regex
CIDR

I’m going to focus on String as it’s the most flexible and best suited to IOC matching. Useful things to know about Reference Lists:

Maximum list size: 6MB
Maximum length of any single list content line: 512 characters

And useful things to know about using a Reference List(s) in YARA-L:

You can use up to 7 in statements in a rule, and you can use multiple types of in statements in the same rule.
At most 2 out of the 7 in statements can use the special operators regex or cidr.
The nocase operator is not working with reference lists. Putting nocase after a reference list call has no effect and is not recommended.

Let’s give it a go, here’s an example of a single-dimension match using a YARA-L rule and a Reference List:

rule ioc_ip4_matches {
  meta:
    author = "thatsiemguy@"
    owner = "infosec@"  
    description = "IOC IPv4 matching via Reference Lists"
    response = "Uses the GCP SOAR automation playbook."
    severity = "INFORMATIONAL"
    priority = "INFORMATIONAL"

  events:
    $ioc.metadata.event_type = "NETWORK_CONNECTION"
    $ioc.metadata.ingestion_labels["label"] = "GCP_FIREWALL"
    $ioc.target.ip = $dip 
    $ioc.target.port = $dport
    $ioc.security_result.action = "ALLOW"

    // checks if the IP is in the Reference List
    $dip in %string_demo_list 

  outcome:
    $risk_score = 0

  condition:
    $ioc
}

And we have matches, pretty neat.

Now, what about if you needed to match IP and Port? As Reference Lists aren’t strongly typed you can just paste a (consistently well formatted) CSV into a String Reference List, like below:

Now, to get a CSV in your YARA-L rule you can use the strings.concat function:

rule ioc_ip4_and_port_matches {
  meta:
    author = "thatsiemguy@"
    owner = "infosec@"  
    description = "What's the point of this Detection Rule, e.g., Rule X detects suspicious command execution on frontend production service X."
    response = "What's the next step if this Detection fires? e.g., Verify the command, if not an approved new workload escalate to the owner."
    severity = "INFORMATIONAL"
    priority = "INFORMATIONAL"

  events:
    $ioc.metadata.event_type = "NETWORK_CONNECTION"
    $ioc.metadata.ingestion_labels["label"] = "GCP_FIREWALL"
    $ioc.target.ip = $dip 
    $ioc.target.port = $dport
    $ioc.security_result.action = "ALLOW"
    $artifact = strings.concat($ioc.target.ip, strings.concat(",",$ioc.target.port ))

    $artifact in %string_demo_list_1 

  outcome:
    $risk_score = 0
    $debug = $artifact

  condition:
    $ioc
}

At the time of writing strings.concat only accepts two parameters, but you can nest functions within functions, like so:

$artifact = strings.concat($ioc.target.ip, strings.concat(",",$ioc.target.port ))

Is it a little brittle? Yes, but does it work, yes.

Finally, Reference Lists can’t be deleted at this time, but you can update an existing list, including blanking out all content to start a fresh. So when using a Reference List + YARA-L rule approach consider that you’ll have perpetual lists that you either you overwrite entirely each time, or append to.

Summary

Given all that, what’s the right way to IOC match in Chronicle SIEM? That depends. There’s no right or wrong way, but rather finding the approach, or approaches that best suit your requirements.

That said, my go to is:

using Entity Graph for current CTI, i.e., looking for active and ongoing campaigns, and noting that for indicators such as IP or Domain it will automatically historically match. The huge benefits for using EG include i) it shows the context of the IOC, i.e., the EG context record, and ii) it supports multiple dimensions, aka IP + Port
Using Reference Lists + YARA-L when I need to perform a historical retro hunt.
Dashboards and BigQuery provide another useful mechanism when performing hunting or general exploration.

And that’s all I have to say about IOC matching 👋