A CISO View: How To Communicate Security Alert Coverage And Quality

Opinionated Security
CISO & Cyber Leaders
9 min readDec 24, 2020

Presenting complex programs such as 24x7 monitoring is often a multi-slide exercise that does little to help executives or the Board to understand your your level of maturity or progress. You can’t just dump a bunch of log data in front of executives and expect them to understand what it means. There has to be a better way.

NIST provides guidance around set of tactical reports so that might be a starting for many organizations…

Trend analyses can include, for example, examining recent threat information regarding the types of threat events that have occurred within the organization or across the federal government, success rates of certain types of cyber attacks, emerging vulnerabilities in information technologies, evolving social engineering techniques, results from multiple security control assessments, the effectiveness of configuration settings, and findings from Inspectors General or auditors.

But I’m not sure how I can demonstrate the maturity of the program from the NIST supplemental guidance. I’ve tried. This might be heresy to say but I think that even I fell asleep after reading the resulting slides. How could my executive team find such tactical data compelling?

Let’s face it. 24x7 logging of security events represents a large investment of both time and money for most organizations. Like any other large investment, a CISO needs to be able to show value to the executive team in a simple way.

  • how well is the SIEM investment performing?
  • what is the level of coverage?
  • Are we getting better?

This means that, as a CISO, I have to be very attentive to my log metrics. Not just the tactical metrics of alerts related to incidents but also the program level metrics that answer how mature our 24x7 monitoring program is and be able to communicate these metrics in a way that the executive team and Board can easily digest and understand.

Most of all, I want a set of executive level metrics about my 24x7 monitoring program that reassures the executive team and me that we are producing high quality alerts in the right areas of the business and that we are improving over time without having to review hundreds of alerts in a manual way.

The tools themselves don’t seem to very helpful in the ways that, as a CISO, I need them to be. There is a lot of very tactical metrics related to the logs out-of-box but not much about the quality of alerts. Those tactical metrics can be summarized in a quantitative way but a set of quality, high fidelity alerts is really the desired outcome for a 24x7 monitoring investment. There simply isn’t much out on the internet that discusses the qualitative aspects of log coverage or quality of alerts so the metrics around that are for us to figure out.

From conversations with other CISOs, this is a common gap so providing at solution for bridging this gap is the objective of this blog post.

A HIGH LEVEL VIEW OF HOW SECURITY ALERTS ARE CREATED

Describing the log to alert process at an easy level of abstraction that rises above the detail of any specific logging can be a good place to start. It can also help us to refine a taxonomy that accurately describes the process in a measurable way.

Moving from log event to an alert is a fairly simple three step process:

  1. Log sources are ingested
  2. The event data from ingested log sources are indexed
  3. Alerts are (or can be) written based on the indexed data

That’s it. At this level of abstraction, any non-technical executive should be able to understand the process with only the smallest explanation. I like simple. Your execs probably do too.

These categories will be useful later as we work to understand the maturity of our 24x7 monitoring program. We’ve also successfully removed the product specific detail so that even if we change monitoring products in the future, we can using the same language and measurements.

One side note that, in real life especially with legacy systems, a successful increase in the coverage of logs being ingested often requires engineering expertise while the writing of alerts can be more of an analyst function. You should be aware of this difference of skillsets. Some organizations, such as ours, have both in a single individual. Others might need different roles to do this well.

EASY YET MEANINGFUL QUANTITATIVE MEASURES

So, where to start? We decided that for our program that there were four key quantitative measures for determining and communicating to answer the questions of “how are we doing?” and “are we improving?” for our executives.

I’ve summarized the four quantitative measures as follows

  • The number of security events monitored every 24 hours: This measurement is the top line quantity of events being measured and makes for easy comparison if we have more coverage of individual security events over time.
  • The amount of log data being stored: This metric ties directly to both growth and cost. If we had x TB of log data being stored last year and it’s y TB now, we can quantify what that means in terms of overall program cost. The amount of data can also give a high level into the cost or percentage of adding a given log source.
  • The total number of alerts: This metric helps us to understand and measure if we are improving our top line coverage of alerts.
  • Alerts that were successful indicators of incidents: This is where your 24x7 monitoring program earns its keep.

While these are great metrics, three of the four indicate nothing about the quality of alerts or the areas that are covered by those alerts. We might have a lot of alerts in a lower priority coverage area that really don’t provide value to my program with major gaps in key coverage areas. Discerning that information from these metrics might require multiple slides.

I want something simple, compelling, and preferably fits into one slide.

MEASURING THE EFFECTIVENESS OF COVERAGE

So, let’s start to build a single view of the quality of our 24x7 monitoring program. We previously had defined three steps that log events had to pass through to become alerts.

Let’s also define quickly outline a basic level of maturity for log sources. It’s just “kentucky windage” that is sufficient for classifying log sources and not intended to be exact numbers.

Combining the three steps with the above maturity model, we might end up with a basic view that looks something like this.

One or two quarters of effort later, the metric might look something like this.

Clearly, the progress is measurable in a way that provides value — 3 reds to green, 1 red to yellow, and 6 yellows to green. The matrix also indicates where future work is needed and can be programmed into a work plan.

This matrix alone may be a good starting point for communicating monitoring coverage and any coverage gaps to executives. That said, we are still missing a way to measure the quality of the alerts that are being generated.

UPPING MONITORING MATURITY TO INCLUDE QUALITY OF ALERTS

In order to increase the monitoring maturity of our organization, I’ve had to make the following three assumptions:

Assumption #1 — There is no comprehensive and centralized reference available that shows all of the available security alerts that can and should be generated from any given log source. While there are certainly best practices scattered around for the most popular log sources, I have yet to find one that is comprehensive across many enterprise applications. This would be a great resource to have if anyone knows of one.

Assumption #2 — Our logging analyst, as incredibly smart as she is, isn’t an expert in all of the log sources that are being collected and therefore can’t possibly always know what alerts are important to write for any given log source. We are going to need to leverage key technical expertise outside of the cyber team but most likely still internal to the company.

Assumption #3 — since I believe that alerts should be viewed as answers to known key questions, we may have security questions about a given log source that we would like to know but the indexing doesn’t allow us to write an alert to answer. This means that we may have gaps in our coverage for any given log source through no fault of our own because the indexing of our tool doesn’t provide the right information upon which we can write the desired alert.

Since we don’t have a reference and our analyst can’t know everything, the next step is build a plan to internally crowdsource related sets of log sources. We will do this by getting the experts on a given set of technology together and ask them what security alerts would be important to them. The key here is not to be limited by what is technically possible but focus on the questions that the log source(s)should answer.

With the expert’s collective feedback on a log source (or related set of log sources), we can then identify three additional of alert buckets of alert quality for any given log source

  • Alerts that have already been created
  • Alerts that can be created but need to be created
  • Alerts that can’t be created because of indexing limitations and are gaps

Building on these categories we can determine what green, yellow, and red might mean for an additional column called “Alert Review”

Which modifies our reporting matrix slightly as follows:

With this new column, we have a way to assure executives which areas have had an in-depth review by experts and all work items related to that review have been completed. We’ve also accomplished this in a way that keeps the discussion at an executive level.

Sweet!

For a one slide executive view, you can simply add your 3 quantitative metrics and trim your matrix list of log sources to just the ones that are important to executives. I have around 20–25 on mine. I’ll leave it to you to determine which are key for you are since you know your executives best.

You can also include any detailed information about the review process, log source issues, or other data in the slide appendix area if the executives want to know more.

As you continue to mature your coverage, you might begin to communicate the source of most of your incident indicators. This would be particularly valuable if the data shows that they are being sourced from log sources that were a result of increased coverage and had few (if any) historical indicators. Demonstrating return on investment is always a great way to gain future support.

There are lots of ways to think about this data, how to build on it, and how to generously share the information with executives.

Complexity made simple. That’s our jobs as CISO.

Do it well!

For more insights into how cyber leaders can best enable the business and build rock solid cyber programs, please follow me on Twitter at @opinionatedsec1

You can also find more of my previous content at the “CISO & Cyber Leaders” publication on Medium: https://medium.com/ciso-cyber-leaders

--

--

Opinionated Security
CISO & Cyber Leaders

Tony Grey * CISO for an insurance company * grew team from 3 to 22 * led large software teams at Microsoft * blogs about cyber leadership & program development