Cyber Threat Intelligence (CTI) Part 5— CTI Lifecycle — Processing
To bring yourself up to date on what we’ve covered so far, I encourage you to check the previous parts of this series where we started with an introduction to Cyber Threat Intelligence (CTI) and then moved to what are the skill set requirements a CTI analyst should have and discussed in details the first two stages of the CTI Lifecycle; Planning & Direction and, Collection. Here is the link for each part:
Cyber Threat Intelligence Part 1 — Quick Introduction to Cyber Threat Intelligence
Cyber Threat Intelligence Part 2 — What are the skill set requirements for a Cyber Threat Analyst?
Cyber Threat Intelligence (CTI) Part 3 — CTI Lifecycle
Cyber Threat Intelligence (CTI) Part 4 — CTI Lifecycle — Collection
In this article, I’ll introduce and discuss the next stage of the CTI Lifecycle: Processing
Once the data has been collected, it must be processed. This is the preparation stage where raw and noisy data is processed so that it can be converted into meaningful intelligence.
The primary goal of the processing stage in the Threat Intelligence lifecycle is to turn raw data from various sources into meaningful intelligence. This is done by analyzing, filtering, and removing irrelevant and poor-quality data and then combining it with useful information obtained from different sources, such as open-source intelligence.
Both structured and unstructured data get processed is in this stage. Unstructured data refers to data that has no pre-defined data model, organizational structure, or format, and it can take various forms like text, images, videos, social media platforms, or blogs, which makes it a massive challenge for cyber threat analysis.
Structured data, on the other hand, is organized and formatted in a specific way like in tables or fields. Structured data could be easily searched, queried, filtered, and analyzed using specific algorithms.
As you would guess, it’s much easier to analyze structured data using automated tools and algorithms compared to the additional effort-mostly manual- to analyze and interpret unstructured data.
The CTI processing stage involves:
Aggregation: The process of collecting and combining diverse cyber threat raw data from various sources to build an understanding of potential cyber threats.
Suppose an organisation needs to identify potential threats related to a particular crypto malware strain. In this scenario, the organisation could utilise the aggregation process to collect raw data from multiple sources. This data could include external threat intelligence feeds, network logs, and internal threat reports from security analysts. Aggregation enables organisations to develop a comprehensive view of potential cyber risks and threats and provides the organisation with the ability to develop and build appropriate defenses.
Normalisation: The process of converting and transforming collected raw data into a common format or structure so that can be easily managed, indexed, and queried.
Here is an example of how normalisation can be used to standardize logging events:
Suppose an organisation operates servers on multiple operating systems, including Linux, and Windows. Each system creates different log files with distinct data types, making it difficult for the organisation to compare and, understand logs, or able to identify patterns.
To resolve this challenge, normalisation comes into play by converting each log file into a standardised log format, such as the Common Event Format (CEF), which ensures that every event will have the same common fields such as DST (Destination IP address)
Deduplication: The removal of redundant data from the cyber threat raw data to eliminate duplicates.
Assume an organisation utilises multiple sources to collect and analyse data logs to monitor user activity. The system collects and stores log that record every user event, including user access, file upload, file manipulations, and downloads. Logs from multiple sources may contain duplicate information due to configuration challenges leading to generating duplicates and producing unnecessary additional entries for cyber security investigation.
Deduplication results in a significant reduction in the size, and the number of files and enables the system to optimise storage capacity, making the data easier to manage.
Enrichment: The process of adding contextual metadata to the collected cyber threat data to enhance the value of the dataset for analysis.
Imagine that a healthcare organisation is concerned with the potential impact of malware on medical devices such as wireless infusion pumps. Through the use of open-source intelligence (OSINT), the organisation discovers that various forms of medical devices that are in use have security vulnerabilities, which threat actors have deeply exploited.
To address this potential cyber threat and to improve their security posture, the organisation can do enrichment by adding their organisational and medical devices-specific metadata and context to their raw data. This could include information on the sensitivity of medical devices, information about medical regulations and compliances, and other organisation-specific processes, like their clinical workflows.
The enriched data could be leveraged to identify the relevant potential vulnerabilities and mitigate the inherent risks within their medical devices. Enrichment techniques employed help organisations to gain situational awareness, identify potential threats, and make faster and well-informed decisions.
The above processing activities are used to improve and increase the Signal-to-noise ratio.
Signal-to-noise is a comparison of the ratio between useful and relevant information (signal) to irrelevant, unnecessary, or erroneous information (noise) collected in a dataset.
Signal-to-noise is an essential measure when working with large and complex data sets.
A high signal-to-noise ratio suggests high-quality data that have less false positives, and fewer data redundancy and creates a consistent and well-structured dataset, thus leading to better data analysis and contextualisation of information gathered.
Lastly, due to the time-consuming nature of the above processing activities, these activities are either automated or semi-automated.
In the next part, I will talk about the fourth stage of the CTI Lifecycle — Analysis.
Thanks for reading and as always, all feedback is welcome.
Lastly, if you enjoy any of my blogs, it would be great if you could please follow me as a reward for the algorithm :)