The power of the “X is a data problem” hypothesis
Author: Jerry Heinz
Jerry Heinz, Chief Technology Officer at ActZero, is a leader in operational excellence, organizational turn-around, and product innovation. He has over 20 years’ experience in product design and engineering. He drives R&D efforts in evolving ActZero as the industry’s leading Managed Detection and Response (MDR) service provider.
How many times have you heard: the ROI of your data science department is off-the-charts!
Highly-tuned, highly-trusted, and highly-productive research teams with the freedom to explore and be curious are rare. As we have learned, though, they are not unattainable.
On the surface, ActZero is a managed detection and response service provider that touts the lowest false alarms in the industry and delivers on this promise through its Hyperscale SOC (Security Operations Center). However, at our core, we are a data-first company firmly rooted in proving the hypothesis that “Security is a data problem.”
Data-first approaches are not new. Over the last twenty years, we’ve watched so-called “big data” disrupt industry after industry. Advances in parallel-processing, distributed computing, and ML/DL algorithms have led to step-function increases in computational complexity. Practically, these advancements let us (the machine, really) reach better, more complicated conclusions faster than ever.
So everyone wanted in … Data scientists became some of the highest-paid technical employees at any company; and every company wanted them. Once they were on-staff, though, they were often unintentionally inhibited by old ways of thinking. Not every company set out to build a self-driving car, to generate realistic opponents in a blockbuster video game, or to spot cancer before a human could. Many scientists joined companies and were embedded in operational teams and directed: “Determine our data science strategy.”
For cybersecurity firms, the holy grail of data science was to eliminate the vast number of false positives their SOCs received each day. Some security practitioners learned data science techniques while others embedded data science within the SOC, yet neither approach achieved sustainable results.
Putting data tools in the hands of subject matter experts (SMEs) invariably allows their cognitive biases to limit their direction, whereas giving a generalist researcher a problem and a data set lets them study the story the data tells them.
When you build a data-first organization the focus can yield materially different results. It all starts with the simple hypothesis: “X is a data problem,” where X is your biggest problem to solve.
The cognitive bias trap
Before we see how starting with the hypothesis that “X is a data problem” can open new possibilities for your business, let’s look at why SMEs with data tools struggle to avoid their own cognitive biases.
Cognitive biases are the result of our brains’ attempts to simplify and quicken decision-making; we rely upon the sum of our past experiences-lived, lessons-learned, knowledge-collected, and habits-formed to inform all judgments and decisions we make. These biases generally fall into two types: memory-related (“I didn’t like the outcome, so I remember the events leading up to it in an unfavorable light”) and attention-related (“I didn’t want the outcome, so I focus more on data that supports my point of view”).
We have all encountered situations where our beliefs have led to snap-judgments. Left unchecked, the more expertise we develop in a particular subject, the more entrenched our own beliefs become. For example, if you have had poor results with version after version of a product, yet the latest version gets rave reviews, are you likely to be persuaded to buy that new version?
Organizing to minimize bias
No one can escape cognitive biases. However when building a data-first organization, there are some best practices to reduce their impact.
We start by adopting the hypothesis that “X is a data problem” from the top down. As humble students, we regularly test this hypothesis; any event under our purview is now questioned: “Can we prove the event exists through data? Show us.”
To be successful, we open ourselves to the possibility that “X is not a data problem” if we can disprove our hypothesis. Doing so provides our SMEs with a safe-environment to fail and a way to confront their biases in a constructive way. After all, if the SME can prove that X is not a data problem, isn’t that a win for our business? Still, because we use top-down support, we purposefully introduce a slight bias toward the hypothesis and thereby incentivize the organization toward finding supporting data first.
Next, we place our data team alongside the SMEs, and not underneath. This partnership structure ensures that the data team is not unduly influenced into considering or discounting specific solutions; past attempts and failures (and their accompanying cognitive biases) are only input to the data team, not direction.
The SMEs continue to do their jobs, to the best of their ability, with the tools they have available; this team can request justification for any change they are asked to make — and I do mean any change.
The data team learns from the challenges the SMEs face and deliver new tools and solutions; this team can request whatever data they think they might need — and I do mean any data.
We clearly define this interface: while both teams jointly own the problem, the data team is student and solution owner and the SME is teacher and operator. Teams can disagree with the constraints but may be asked to gamble on or commit to the solution.
Lastly, we codify this relationship in goals (confession: I like OKRs). For a shared objective of “Improve SME work quality/speed through data” sample data team key-results could be:
- Spend X learning sessions with SMEs
- Introduce N improvements to SME tools
- Reduce an SMEs average signal-to-noise from X to Y
- Reduce an SME’s event handling time from X to Y
- Achieve an X out of 10 user feedback score from SMEs
For SMEs, sample key-results could be:
- Conduct X mentor sessions with data team members
- Provide data-proof substantiating each of X event-types
- Conduct root-cause-analysis for each unsubstantiated major event
- Adopt N new tool improvements within X time
- Provide user feedback on Y tool improvements
Note that the key-results are complementary to one another. Both teams need to partner in order to achieve their goals. Therefore, we choose our teams wisely. SMEs who are unable to explain their job and challenges, who are not team-players, and who are not open to change will cause significant friction. Data teams who are not quick studies and empathetic will produce ineffective solutions that struggle with adoption and erode trust.
ActZero: A practical example
At ActZero, our data team is made of the three classic roles: data scientists, ML engineers, and data engineers. The data scientists model behavior and reduce signal-to-noise. ML engineers bring these models to production and scale them widely. Data engineers feed the models (and the team) with rightly-formatted data at low-latency. Individually, these data scientists, ML engineers, and data engineers are best at their craft, but they are not security experts.
Our SMEs bridge two disciplines: first, our SOC employees are charged with protecting customers, implementing tools, and verifying efficacy. Second, our security researchers are charged with providing data about new or emergent threats, particularly when lacking real-world data.
Security experts, like most of us, are not used to having someone actively observing their work, and they aren’t accustomed to answering many questions about their intuitions. Our runbooks described what we do, however our data team needed to understand the “why.”
To reduce friction (and there was some!), we aligned our OKRs and created structured interactions. We broke everything down, step by step: Why do we know we should perform threat hunts? Why do we know this script is malicious and this other is not? Why do we block these outbound connections and not those?
By observing and interviewing senior security experts using their intuitions to find threats the data team could then confidently model that expertise. The results of the models astounded our security teams: here for the first time, after so many broken promises from tool vendors, was software that could produce better results than the security team could on their own. And this was only the first iteration.
Highly-tuned, highly-trusted, and highly-productive data teams with the freedom to explore and be curious are rare. Rallying around a shared hypothesis of “X is a data problem” is a straightforward way to align your organization and achieve your desired return on your data science investment. The focus that this hypothesis creates, and how you organize around this focus, can dispell cognitive bias and produce innovation where none thought it possible.
The difficulty in hiring Data Scientists, and other technically-minded employees, is just one reason this dynamic is rare. To hear more about our “Great Retention” prediction, check out our white paper 2022 Cybersecurity Predictions. Or, to learn more about our process, download our white paper, The ‘Hyperscale SOC’ and the Minds Behind It.