THM Intro to Logs

Dehni
Dehni’s Notes
Published in
10 min readNov 9, 2023

Intro to Logs Module Notes

Logs serve as invaluable records of past events. By preserving an archive of historical activities, we can bolster our security posture and protect our digital assets more effectively.

A comprehensive understanding of logs is crucial for identifying patterns and mitigating potential threats.

Log analysis tools and methods empower individuals to interpret historical events and establish a reliable source of historical evidence, streamlining the processing and scrutiny of log data. This efficiency facilitates prompt detection and response to potential incidents or significant events.

In the digital world, every interaction with a computer system — from authentication attempts, granting authorisation, accessing a file, and connecting to a network to encountering a system error — will always leave a digital footprint in the form of logs.

Logs are a record of events within a system. These records provide a detailed account of what a system has been doing, capturing a wide range of events such as user logins, file accesses, system errors, network connections, and changes to data or system configurations.

While the specific details may differ based on the type of log, a log entry usually includes the following information:

  • A timestamp of when an event was logged
  • The name of the system or application that generated the log entry
  • The type of event that occurred
  • Additional details about the event, such as the user who initiated the event or the device’s IP address that generated the event

This information is typically stored in a log file, which contains aggregated entries of what occurred at any given time on a system.

Logs can answer critical questions about an event, such as:

  • What happened?
  • When did it happen?
  • Where did it happen?
  • Who is responsible?
  • Were their actions successful?
  • What was the result of their action?

Log Types

Specific log types can offer a unique perspective on a system’s operation, performance, and security.

Below is a list of some of the most common log types:

  • Application Logs: Messages about specific applications, including status, errors, warnings, etc.
  • Audit Logs: Activities related to operational procedures crucial for regulatory compliance.
  • Security Logs: Security events such as logins, permissions changes, firewall activity, etc.
  • Server Logs: Various logs a server generates, including system, event, error, and access logs.
  • System Logs: Kernel activities, system errors, boot sequences, and hardware status.
  • Network Logs: Network traffic, connections, and other network-related events.
  • Database Logs: Activities within a database system, such as queries and updates.
  • Web Server Logs: Requests processed by a web server, including URLs, response codes, etc.

Log Formats

A log format defines the structure and organisation of data within a log file. It specifies how the data is encoded, how each entry is delimited, and what fields are included in each row. These formats can vary widely and may fall into three main categories: Semi-structured, Structured, and Unstructured.

Semi-structured Logs: These logs may contain structured and unstructured data, with predictable components accommodating free-form text.

Example:

Structured Logs: Following a strict and standardised format, these logs are conducive to parsing and analysis.

Example:

Unstructured Logs: Comprising free-form text, these logs can be rich in context but may pose challenges in systematic parsing.

Example:

Log Standards

A log standard is a set of guidelines or specifications that define how logs should be generated, transmitted, and stored. Log standards may specify the use of particular log formats, but they also cover other aspects of logging, such as what events should be logged, how logs should be transmitted securely, and how long logs should be retained.

Some of the log standards:

Log Collection

Log collection is an essential component of log analysis, involving the aggregation of logs from diverse sources such as servers, network devices, software, and databases.

For logs to effectively represent a chronological sequence of events, it’s crucial to maintain the system’s time accuracy during logging. Utilising the Network Time Protocol (NTP) is a method to achieve this synchronisation and ensure the integrity of the timeline stored in the logs.

Step by step — Log Collection

  • Identify Sources: List all potential log sources, such as servers, databases, applications, and network devices.
  • Choose a Log Collector: Opt for a suitable log collector tool or software that aligns with your infrastructure.
  • Configure Collection Parameters: Ensure that time synchronisation is enabled through NTP to maintain accurate timelines, adjust settings to determine which events to log at what intervals, and prioritise based on importance.
  • Test Collection: Once configured, run a test to ensure logs are appropriately collected from all sources.

Log Management

Efficient Log Management ensures that every gathered log is stored securely, organised systematically, and is ready for swift retrieval. A hybrid approach can provide a balanced solution by hoarding all log files and selectively trimming.

Steps for Log Management

  • Storage: Decide on a secure storage solution, considering factors like retention period and accessibility.
  • Organisation: Classify logs based on their source, type, or other criteria for easier access later.
  • Backup: Regularly back up your logs to prevent data loss.
  • Review: Periodically review logs to ensure they are correctly stored and categorised.

Log Centralisation

Centralisation is pivotal for swift log access, in-depth analysis, and rapid incident response. A unified system allows for efficient log management with tools that offer real-time detection, automatic notifications, and seamless integration with incident management systems.

Centralising your logs can significantly streamline access and analysis.

Steps for Log Centralisation

  • Choose a Centralised System: Opt for a system that consolidates logs from all sources, such as the Elastic Stack or Splunk.
  • Integrate Sources: Connect all your log sources to this centralised system.
  • Set Up Monitoring: Utilise tools that provide real-time monitoring and alerts for specified events.
  • Integration with Incident Management: Ensure that your centralised system can integrate seamlessly with any incident management tools or protocols you have in place.

Log Storage

Logs can be stored in various locations, such as the local system that generates them, a centralised repository, or cloud-based storage.

The choice of storage location typically depends on multiple factors:

  • Security Requirements: Ensuring that logs are stored in compliance with organisational or regulatory security protocols.
  • Accessibility Needs: How quickly and by whom the logs need to be accessed can influence the choice of storage.
  • Storage Capacity: The volume of logs generated may require significant storage space, influencing the choice of storage solution.
  • Cost Considerations: The budget for log storage may dictate the choice between cloud-based or local solutions.
  • Compliance Regulations: Specific industry regulations governing log storage can affect the choice of storage.
  • Retention Policies: The required retention time and ease of retrieval can affect the decision-making process.
  • Disaster Recovery Plans: Ensuring the availability of logs even in system failure may require specific storage solutions.

Log Retention

It is vital to recognise that log storage is not infinite. Therefore, a reasonable balance between retaining logs for potential future needs and the storage cost is necessary. Understanding the concepts of Hot, Warm, and Cold storage can aid in this decision-making:

  • Hot Storage: Logs from the past 3–6 months that are most accessible. Query speed should be near real-time, depending on the complexity of the query.
  • Warm Storage: Logs from six months to 2 years, acting as a data lake, easily accessible but not as immediate as Hot storage.
  • Cold Storage: Archived or compressed logs from 2–5 years. These logs are not easily accessible and are usually used for retroactive analysis or scoping purposes.

Log Deletion

Log deletion must be performed carefully to avoid removing logs that could still be of value. The backup of log files, especially crucial ones, is necessary before deletion.

Logrotate

Logrotate is a tool that automates log file rotation, compression, and management, ensuring that log files are handled systematically. It allows automatic rotation, compression, and removal of log files.

Log Analysis Process

Log analysis involves Parsing, Normalisation, Sorting, Classification, Enrichment, Correlation, Visualisation, and Reporting. It can be done through various tools and techniques, ranging from complex systems like Splunk and ELK to ad-hoc methods ranging from default command-line tools to open-source tools.

Data Sources: Data Sources are the systems or applications configured to log system events or user activities. These are the origin of logs.

Parsing: Parsing is breaking down the log data into more manageable and understandable components. Since logs come in various formats depending on the source, it’s essential to parse these logs to extract valuable information.

Normalisation: Normalisation is standardising parsed data. It involves bringing the various log data into a standard format, making comparing and analysing data from different sources easier. It is imperative in environments with multiple systems and applications, where each might generate logs in another format.

Sorting: Sorting is a vital aspect of log analysis, as it allows for efficient data retrieval and identification of patterns. Logs can be sorted by time, source, event type, severity, and any other parameter present in the data. Proper sorting is critical in identifying trends and anomalies that signal operational issues or security incidents.

Classification: Classification involves assigning categories to the logs based on their characteristics. By classifying log files, you can quickly filter and focus on those logs that matter most to your analysis. For instance, classification can be based on the severity level, event type, or source. Automated classification using machine learning can significantly enhance this process, helping to identify potential issues or threats that could be overlooked.

Enrichment: Log enrichment adds context to logs to make them more meaningful and easier to analyse. It could involve adding information like geographical data, user details, threat intelligence, or even data from other sources that can provide a complete picture of the event. Enrichment makes logs more valuable, enabling analysts to make better decisions and more accurately respond to incidents. Like classification, log enrichment can be automated using machine learning, reducing the time and effort required for log analysis.

Correlation: Correlation involves linking related records and identifying connections between log entries. This process helps detect patterns and trends, making understanding complex relationships between various log events easier. Correlation is critical in determining security threats or system performance issues that might remain unnoticed.

Visualisation: Visualisation represents log data in graphical formats like charts, graphs, or heat maps. Visually presenting data makes recognising patterns, trends, and anomalies easier. Visualisation tools provide an intuitive way to interpret large volumes of log data, making complex information more accessible and understandable.

Reporting: Reporting summarises log data into structured formats to provide insights, support decision-making, or meet compliance requirements. Effective reporting includes creating clear and concise log data summaries catering to stakeholders’ needs, such as management, security teams, or auditors. Regular reports can be vital in monitoring system health, security posture, and operational efficiency.

Log Analysis Tools

Security Information and Event Management (SIEM) tools such as Splunk or Elastic Search can be used for complex log analysis tasks.

However, in scenarios where immediate data analysis is needed, such as during incident response, Linux-based systems can employ default tools like cat, grep, sed, sort, uniq, and awk, along with sha256sum for hashing log files. Windows-based systems can utilise EZ-Tools and the default cmdlet Get-FileHash for similar purposes.

Log Analysis Techniques

Log analysis techniques are methods or practices used to interpret and derive insights from log data. These techniques can range from simple to complex and are vital for identifying patterns, anomalies, and critical insights. Here are some common techniques:

Pattern Recognition: This involves identifying recurring sequences or trends in log data. It can detect regular system behaviour or identify unusual activities that may indicate a security threat.

Anomaly Detection: Anomaly detection focuses on identifying data points that deviate from the expected pattern. It is crucial to spot potential issues or malicious activities early on.

Correlation Analysis: Correlating different log entries helps understand the relationship between various events. It can reveal causation and dependencies between system components and is vital in root cause analysis.

Timeline Analysis: Analysing logs over time helps understand trends, seasonalities, and periodic behaviours. It can be essential for performance monitoring and forecasting system loads.

Machine Learning and AI: Leveraging machine learning models can automate and enhance various log analysis techniques, such as classification and enrichment. AI can provide predictive insights and help in automating responses to specific events.

Visualisation: Representing log data through graphs and charts allows for intuitive understanding and quick insights. Visualisation can make complex data more accessible and assist in identifying key patterns and relationships.

Statistical Analysis: Using statistical methods to analyse log data can provide quantitative insights and help make data-driven decisions. Regression analysis and hypothesis testing can infer relationships and validate assumptions.

--

--