Which CIS Controls Matter: An Empirical Analysis of Cyber Insurance Risk Assessment
The CIS standard is a set of Critical Security Controls. The recently announced Cyber Risk Dataset is a historical breach dataset with manual annotations for which CIS controls were affected during a breach. This allows us to conduct a new analysis of which CIS controls insurers look at, what the dataset says, and what security experts express through CIS prioritization. We find a stark contrast between the priorities set by the CIS controls and those insurers focus on, which may be related to the availability of data.
The CIS standard is prioritized by security experts
CIS Critical Security Controls is a framework developed by the Center for Internet Security (CIS) and represents a prioritized set of best practices designed to prevent the most pervasive and dangerous cyber attacks today.
We use the CIS standard because unlike the other more comprehensive standards such as NIST or ISO2700x, the CIS Top 20 framework was developed to provide organizations with a more concise set of prioritized controls. It was developed by security experts from different industries to help organizations reduce risk and establish a baseline cybersecurity program.
Annotating historical CIS controls from breach data
For risk quantification, we need to derive weights for the importance of individual controls. In an ideal world, we would like to get information about the state of the security controls at the time of the breach. However, this information is not widely available since companies are not compelled to disclose details about their security posture at the time of the breach.
Our security team has developed a scenario-based methodology for each CIS control and trained human annotators to be able to manually impute the affected CIS controls for each breach. From the first batch of 2,750 largest and most notable breaches, we managed to attribute at least one CIS control to approx. 1,000 cases.
Let’s say a phishing attack from a human error resulted in a theft of credentials, e.g. a user imputed his credentials to a phishing site. The credentials were then used to log in to a remote machine and later the attacker found out that POSs (pay terminals) were located on the same network. The attacker then used this access to steal credit card information from the terminals.
In this scenario, we can reason that the employees of this company were not trained sufficiently (CIS 17: Implement a Security Awareness and Training Program). The ability of the attacker to log in remotely with stolen credentials implies that multifactor authentification was missing (CIS 16: Account Monitoring and Control), and finally, the missing separation of sensitive applications on the local network implies CIS 14: Controlled Access Based on the Need to Know was either missing or not implemented correctly.
Identifying the most important CIS controls
Our dataset is heavily skewed towards consumer data breaches, given that companies are compelled to disclose these breaches. With this in mind, we can still perform a simple aggregation and count how many times each CIS control occurred in the annotated dataset. You can see the result in Figure 1 below.
The 3 most frequently affected controls from our analysis are:
CIS 14 — Controlled Access Based on the Need to Know: This covers all the cases when the network was not properly segmented based on application and data sensitivity, e.g. cases when retailer’s Point Of Sale (POS) devices were on the same network as regular employee endpoints. It also includes cases when shared folders were not properly protected with access controls and unauthorized people had access to sensitive data such as IP, PII, PHI, PFI, etc. Finally, scenarios such as unencrypted hard drives lost during transport by 3d parties, stolen unencrypted laptops, and disk drives.
CIS 13 — Data Protection: This control covers all scenarios related to data stolen from undocumented or misplaced storage locations (laptops, network drives, 3d party cloud providers, etc.), data backups, legacy databases, and applications. Additionally, it includes cases when raw data in the clear text were exfiltrated without detection.
CIS 17 — Implement a Security Awareness and Training Program: Covers all cases of fishing and more general cases when the attacker requested an employee to make some action such as making a wire transfer, sending a tax form or other sensitive information. Any unintentional disclosure of sensitive data to the attacker is included as well.
There is a stark contrast between the chart above and the priorities set by the CIS controls. That doesn’t mean the CIS priorities are invalid, but rather shows how limited inferences from biased public data can differ from the consensus of security professionals. The limitation of breach data is one of the reasons security risk research is so prone to bias and where existing data needs to be combined with expert knowledge in order to build representative models.
Do insurers focus on the right controls during the risk assessment and underwriting?
Let’s compare the results above with the analysis of insurance questionnaires conducted in 2017. This study maps questions from common insurance application forms to individual CIS controls. Figure 2 below is credited to the authors and shows the total number of sub-controls addressed per control for each CIS Control.
There are obvious overlaps in the two charts, most notably the high occurrence of consumer-related breach controls — CIS 13: Data Protection and CIS 14: Controlled Access Based on the Need to Know, and CIS 16: Account Monitoring and Control and CIS 17: Implement a Security Awareness and Training Program.
On the other hand, there are also two noticeable differences between the two charts, specifically the relatively high scores of CIS 8: Malware Defenses and CIS 10: Data Recovery Capabilities in insurance applications. The authors of the paper explain that CIS 8 scores high because one of the most common questions asked by the insurers is whether anti-virus and personal firewalls are running on all work stations of the target organization (considered an easy win). Whereas CIS 10: Data Recovery Capabilities is mostly relevant for scenarios relating to business interruption losses and data availability such as ransomware, which insurers want to minimize. This control is tricky because usually the information we have available isn’t about whether the company was able to recover the data (the incident response), but rather which controls were affected during the attack.
The authors conclude that there is a worrying discrepancy between the security experts opinions and the focus of insurers in the underwriting process and that one explanation may be that insurers focus on these controls more because they mitigate the risk they are liable for. Our analysis above suggests another explanation — perhaps insurers are more biased toward modeling breaches with accessible public data rather than on modeling the overall quality of the security programs implemented by the insureds. If the latter, this would be an interesting example where data limitations bias us against robust global financial resilience from cyber attacks.
Note: The cited article uses CIS controls version 6.0, while the annotation methodology uses version 7.0. You should keep this in mind when comparing the two charts.
Feedback on the dataset
This is the most complete dataset for breaches that we’ve found available, but it is far from complete. We want your feedback — how can the dataset be extended or enriched to make it more representative and useful? Are there other sources of data we can aggregate to add other breach types or annotate other security-related metadata?
If you have any feedback feel free to contact us at firstname.lastname@example.org
 Daniel Woods, Ioannis Agrafiotis, Jason R. C. Nurse, and Sadie Creese, Mapping the coverage of security controls in cyber insurance proposal forms, Journal of Internet Services and Applications 8 (2017), no. 1, 8. link