There’s a bunch of information across many compliance and best practice frameworks that talk about the benefits of collecting & retaining security audit logs. However, what about the actual content of what qualifies a “good” audit log for writing detections or completing security investigations?
Let’s talk about what it should look like!
Standard Logging Expectations
There are two main types of logs — audit logs and system logs. Security engineers may need both types of logs; however, audit logs are the ones that we use most frequently.
DataDog provides good context on the difference.
The difference between audit logs and system logs (e.g., error logs, operational logs, etc.) is the information they contain, their purpose, and their immutability. Whereas system logs are designed to help developers troubleshoot errors, audit logs help organizations document a historical record of activity for compliance purposes and other business policy enforcement.
They also provide documentation on what information to include in each audit log:
- Event name as identified in the system
- Easy-to-understand description of the event
- Event timestamp
- Actor or service that created, edited, or deleted the event (user ID or API ID)
- Application, device, system, or object that was impacted (IP address, device ID, etc.)
- Source from where the actor or service originated (country, host name, IP address, device ID, etc.)
- Custom tags specified by the user, such as severity level of the event
Starting with the Customer
Audit logs are used by a variety of engineering departments. Within security engineering, they may be used for incident response, detection writing, privacy engineering, and other subspecialties.
Here’s a short list of potential engineering customers to keep in mind:
- Detection Engineers
- Security Analysts / Incident Responders
- Privacy Engineers
- Software Engineers
- Site Reliability Engineers (SRE)
- IT Professionals
Security Engineer’s Log Requirements
In order to support efficient incident response and accurate detection writing, security engineers require specific data to be included in every audit log.
Here’s the TLDR if you’re a software engineer at a SaaS company: if you don’t include the fields described below, then you are making our jobs more difficult (and we might bug you for changes to improve your logs).
- Timestamp in UTC
- IP Address
- Event Outcome (Success / Failure)
- User Email / Username / User ID
- User Agent
- Browser Used (otherwise can be derived from User Agent)
- OS Used (otherwise can be derived from User Agent)
- Geographical Context — City, State, Country, Latitude & Longitude
- User Type (standard user, service account, admin, etc.)
- User’s Full Name
- Organization Name & Number from IP (ASN Lookup)
Creating Event Types
While having a log generated from an event type is critical, it is also important that events taken by a user or the system are actually captured and logged. For example, when a query is run or an account is created.
Some companies include incredible detail, such as every time a page is visited by a user. While this isn’t necessary, depending on the service you provide, it may come in handy for us when investigating a potential security incident.
However, logs should always include the event action and the account, actor, or other object that is affected by that action. A good example of this is when a Google Drive document has a permission change from internal to external.
Google Drive log type change_document_visibility includes…
- Actor — Email address of the user who performed the action.
- Event — The logged event action.
- Prior visibility — Previous visibility of the document in case visibility is changed
- Target — User whose access is changed
- Title — Title of the document
- Visibility — Visibility of the Drive item associated with the activity
- Visibility change — Visibility of the Drive item before the activity
Based on these fields, it is clear to the security engineer what action happened, what was affected, and who completed the action.
A counterexample is the event type Server-Fetch used by 1Password. This event type is vague and has a different definition depending on which 1Password client is used by the user.
Not only are clear and consistent event types critical, but it is also necessary to have accessible documentation with descriptions on what the event type indicates. Without this information, security engineers are blocked from investigating their own security incidents and making useful detections.
If you work at 1Password and would like help making your logs better, then let me know!
Setting Good Examples
What makes a log “good”?
- It is well structured and has verbose details, including all recommended fields mentioned earlier.
- It has consistent, easily understood event types.
- It has clear documentation that is accessible and easy to search.
What are some indicators of a “bad” log?
- Lack of information to attribute activity to a user or IP address.
- Poor formatting and structure that makes it difficult to access required information.
- Ex: Unordered arrays inside of nested JSON
- Ex: Random dicts inside of variable length arrays so the same event may have mixed lengths.
- Different formats depending on which internal team built each event type.
- Inconsistent event type definitions based on how a user is accessing the system.
- Inconsistent formats and naming conventions that differ if you pull it via API or view it in the UI.
The ability for security engineers to adequately use audit logs to protect their company highly depends on the quality of and information in those logs. Our ability to standardize data across systems, build enrichment, and find malicious behavior is dependent on good logging!
For anyone wanting to see some more examples of good logs, here are some of my favorite audit logs:
Thanks for tuning in, if you have any good documentation for making good audit logs then please send them my way!