M-21–31: Ye Shall Log, No Matter the Log
There were a lot of news stories this week¹² about federal agencies’ inability to achieve the cyber logging requirements in M-21–31. I also saw some discussions on Twitter that raised the normal questions about the U.S. government’s poor cybersecurity posture, etc. I think, in this case, we should probably be nicer to the agencies.
Quick background: M-21–31 was issued as a result of Executive Order 14028 (the ‘Cyber EO’). The EO directed DHS/DOC/OMB to issue logging requirements for the federal government within ~104 days of issuance.³ The memo creates 4 tiers of Logging Maturity with required compliance dates for each tier. More event types are required as the tiers increase. There are also additional non-logging requirements at each tier (agency SOC access, SOAR, etc.), but that is not my concern.
To see more details on the tiers and requirements, go to Analyzing the Tiers.
GAO Audit
The news stories are based on a newly released GAO audit, which actually focused on EO14028 outcomes as a whole. Some highlights from the logging parts of the audit:
“Three agencies had met tier 3. These agencies were the Department of Agriculture (Agriculture), the National Science Foundation (NSF), and the Small Business Administration (SBA)” (pg. 26)
“…officials stated their agencies were not expected to meet the tiers soon.” (pg. 26)
One agency official stated that its agency estimated that it would require more than 9 years or sufficient additional funding for contractors to account for the new workload needed to meet the event logging tiers. (pg. 27)
“agency officials cited the “all or nothing” nature of the requirements, meaning even if a majority of systems had reached the tier 1 requirements, if all systems had not reached tier 1, the agency overall would be at tier 0.” (pg. 26–27)
The GAO audit, and the OIG reports it references, do not evaluate the memo. They simply state agencies must comply and ask agencies why they haven’t. I think it’s important for organizations to be deeply self-critical, especially regulatory ones, because their work has large impact. In a world of limited government resources, the resources spent on poorly crafted cybersecurity regulation have a high opportunity cost.
It is clear agencies are struggling with compliance. The agencies cite lack of funding, staffing, and technology as reasons for their failure. But are these reasons justifiable? Let’s try to understand.
Small note: In February 2023, CISA published a short guidance document on M-21–31 that provides a second layer of prioritization for the logs within the existing Tier system (in Tier 0 logs, prioritize XYZ first). However, the CISA guidance does not change the M-21–31 requirements. It states,
This guidance intends to complement and clarify the requirements within M-21–31 and does not supersede or conflict with the policy.
Analyzing the Tiers
What must be logged?
First, the exact logging requirements are not easy to understand. The memo separates the requirements by “Logging Category” and then by “Required Data”. Below is the very first entry.
Question for the reader: Which line here represents a discrete event type? Account Creation is straightforward, but what is Manage Credential Type? What is Track Usage of Credentials? Is that referring to a sign-in log? It’s really not clear what the event type is. This issue occurs throughout the memo.
In addition to the unclear phrasing, there are other problems.
Some ‘atomic’ event types are grouped:
OS: — Start-Up and Shutdown of the System (pg. 18)
and others are not:
- Account Creation
- Account Deletion
Some mention an action you should take, instead of an event:
Monitor, Alert and Respond to Anomalous Behaviors/Activities (pg. 14)
Some are possibly duplicates⁴:
Network Device Infrastructure (General Logging) — DNS Query/Response Logs (pg. 16)
and
Network Device Infrastructure — DNS…content of query (pg. 14)
Some are supersets of others (note the *):
System Log Folder: /Var/Log/* (pg. 23)
and
System Log: /Var/Log/System.Log (pg. 23)
Some are closer to inventory data and not events:
Device Data
• Device Name
• Device Manufacturer and Model
• Serial #
• Phone #
• IMEI, IMSI, OS Version, OS Build
• Firmware Version
Some are so broad they are almost immeasurable:
OS — System Events (pg. 20)
All of these inconsistencies make it impossible to calculate the true number of required event types (and therefore impossible to consistently measure). But let’s try anyway. Below is my very rough calculation:
- Event Logging Tier 1 | at least 215 log types (~60%)
- Event Logging Tier 2 | at least 130 log types (~36%)
- Event Logging Tier 3 | at least 12 log types (~.03%)
That’s at least 357 different events. There are probably more, because I did some generous groupings. Some are also double counted, because the memo separates by device type (e.g. Linux logon events and Windows logon events are separate).
Finally, there is also no discussion in the memo regarding which systems the logs must be collected from. Is it only production systems? Development? Every system in the agency? I doubt every agency is using the same criteria.
Retention
Log retention discussions usually focus on what to store and for how long, based on your threat model. These discussions are necessary because most SIEM/data lake costs come from the amount of data ingest (e.g. pay per GB of ingested data or pay per GB of data scanned in a query).
Unfortunately, M-21–31 is not a fan of this discussion.
Every event type, except two⁵, must be stored for:
- 12 months Active Storage
- 18 months Cold Storage
This includes extremely high volume events like:
- OS — Registry Access (pg. 21)
- OS — File and Object Access (pg. 20)
- Web Applications — HTTP Request and Response with Body of Data (pg. 33)
There is no consideration for the value, or ROI, of a log type based on its volume and provided insight.
Filtering
The memo also does not mention filtering of any kind, another very common cost management approach in logging discussions. See SwiftonSecurity’s Sysmon config for popular attempts or Olaf Hartong’s presentation on how Microsoft Defender for Endpoint implements event filtering.
In the industry, the question is not “should we filter out events?”, it is “what should we filter out?” and “how should we balance visibility requirements with cost?” However, M-21–31 does not discuss this and therefore implies agencies must log every single event in these categories, which will dramatically increase the cost.
Costs
When agencies say they have funding issues, I think they’re right. But let’s try to calculate some numbers. For simplicity, I’ll focus on just one category: Windows event logs.
To calculate an estimate, I’ll use a small agency like FRTIB, which manages the federal government’s 401(k) and has ~300 employees.
Scenario: Windows Events
To calculate the number of Windows devices:
I estimate ~1 employee = 1 device, so ~300 employee devices.
To calculate the size per Windows device:
I ran Procmon with no filters for 10 minutes on my laptop⁶. The result file in .PML format is ~500MB.
500MB/10min = 50MB/min * 60min/hour * 8 hour/workday = 24GB/day/device.
24GB/day/device * 300 devices = 7,200GB/day (or 7.2TB)
That is a lot of data. Even if you can find some good compression savings, it is still a lot of data.
If FRTIB ingested this into Microsoft Sentinel, it would cost at least ~$16,600/day or ~$6mil/year.⁷ FRTIB’s FY2023 budget was ~$400 million.
Edit 12.12.2023: The Sentinel cost only includes 90 days of Active storage and the memo requires 180 days. So you’re not even at the minimum retention yet!
That’s 1.5% of the Agency budget dedicated to Windows event logs.
And there’s still the remaining 350 event types!
Recommendations
If you were a federal CISO, could you truly attest that your agency is meeting the requirements?
Considering the unclear event log requirements, unclear system scope, inability to filter events, and resulting costs, the memo needs to change. Especially if GAO or OMB expect agencies to report compliance.
- Reissue the memo — this seems necessary. It just has too many problems.
- Move the requirements to CISA’s authorities — the requirements seem too low-level to be managed by OMB.
- Move the requirements to an updatable format — this enables OMB (or CISA) to update the exact event type requirements at their discretion. This allows for new event types to be added and times adjusted if necessary.
- Standardize how event types are described — the current requirements are inconsistently structured, such as when they decide to include required fields (e.g. Source Port) or how they group events. Pick a lane. Preferably the more explicit one.
- Fix the presentation of the event type requirements— the current table is difficult to follow and creates confusion.
- Create ROI-informed retention times — compare the volume and usefulness of events and build ROI-aligned retention times. This is not an easy task, but the alternative is worse.
- Decide a position on filtering — the new memo should provide agencies the flexibility to filter events. Or don’t allow it and find a way to defend that position.
- Make the requirements more accessible and known outside of federal government— everyone cares about logging. If you’re doing the work to decide what is important to the government, why not tell the world about it?
Footnotes
[1] 20 federal agencies miss deadline for implementing cyber incident tracking requirements, watchdog says — Nextgov/FCW
[2] Only 3 agencies have hit deadline for cyber event logging standards, GAO finds | FedScoop
[3] Section 8, subsections (a)-(c)
[4] There are many more duplicates. Look at Linux event logs and PowerShell too.
[5] PCAP [72 hours], Cloud GCP logs [6 months + 18 months]
[6] You might wonder why and I direct you back to the requirement for File/Object/Registry access. 👀 Procmon also doesn’t capture every event required for Windows, so even this number represents a subset.
[7] East US region, 5TB commitment tier, 7,200GB/day * $2.31/GB * 365 days