Classifying types of “Security Work”

Applying the types of work from The Phoenix Project to security engineering

Ryan McGeehan
Starting Up Security
11 min readDec 9, 2019

--

The types of work model demonstrated in The Phoenix Project is useful in surfacing and inspecting the work patterns of a security organization. Toil and other frustrating areas of organizational friction are teased out well with this classification method.

An interpretation of these types of work as Security Work puts a security organization under scrutiny as if it were manufacturing workflow. Bottlenecks and imbalances can be discussed where toil is generated. I didn’t expect myself to draw this much usefulness from a fiction novel. But, it works great, and people sometimes know about these types of work from the book.

This model has a narrow purpose. I have found this approach helps inspect common criticisms from an engineering organization as opposed to scrutinizing it’s mission or effectiveness against risks:

These complaints are targeted:

  • “We are always chasing shiny objects.”
  • “We are always firefighting.”
  • “We are never acknowledged for the day-to-day efforts.”
  • “We are under resourced.”
  • “We are always brought in last minute.”

This model fails to assess the organization based on the risks they think they mitigate. But that’s OK. We have other methods for that.

The following represent the four categories of Security Work mapped to their counterparts in The Phoenix Project. Below are examples of each type of work, and some commentary on it as an organization. These types of work are Business Projects, Security Engineering, Security Operations, Incidents and Unplanned Work. I’ve included their original names from The Phoenix Project below. The overall workflow expects that an organization plans for (and completes) very predictable work in the top three categories to avoid Incidents and Unplanned work downstream. The next goalpost is to move Business Projects and Security Operations work toward Security Engineering and eliminate the business dependencies on security and the toil of operational work.

This is a qualitative modeling approach to work categorization and has some rough edges as a result. All models are wrong.

Business Projects (💰)

(Phoenix Project also calls this Business Projects)

Business projects can be interpreted a few ways. It strictly includes the support of projects that generate revenue, or more liberally identifies how the business is changing in pursuit of success but requires collaboration from supporting organizations like a security team.

This includes: A new product, a new partnership, a new office, a new organization. The security organization wants to help out and support it by mitigating associated risks.

Of course, we hope that the business understands the value of eliminating risks as early as possible to avoid distractions downstream. The ideal business invites or expects this sort of involvement from the security team during the earliest stages possible.

This can often look like an internal security consulting function in its purest form, like design or architecture reviews. That’s not a strict guideline. To inspect this, we have to analyze how an organization is involved with the business in policy and procedure, or by understanding the soft cultural and social ways this gets accomplished. Here’s some guidance.

Reasons security is included in the business: An outside business executive has sponsored the team or a security executive (CISO) is involved with the business, or both. The security engineering organization itself is charismatic and helpful and included at grassroot individual contributor levels, or has “champions” or other reverse-advocacy approaches. The business is transparent and potential work projects are discoverable and transparent to the security team with open discussions, tasks, repositories, pull requests, or internal RFCs. The security org’s judgement is trusted in identifying and suggesting mitigations for risks and tasks are mutually completed by the business and the security team.

Reasons security is avoided: The security org demands too much, advocates for absurd risks, is unfriendly, confrontational, viewed as expensive or unreasonable on timeline, has no executive support, or the organization is not transparent enough to allow others to jump in on a roadmap.

Security Operations (🔁)

(Phoenix Project calls this “Changes”)

Security Operations includes all of the perpetual security work that needs to be constantly online for it to be meaningful. It involves process and procedure that enforces our belief that certain risks are constantly attended to. It is a commitment. As you add to operations work, the underlying risk needs to be thoughtfully mitigated or eliminated before the operational work is abandoned.

An example: An engineer completes a vulnerability assessment of your network. They find and fix a few critical vulnerabilities. How long until you believe it should be run again? Daily, weekly, monthly? Continuously? That decision relates to operations. The commitment to an ongoing process is operations, even if it only involves managing the work created from automated workflows or pruning those workflows themselves. Simply keeping automation running is an operation in itself.

You can categorize something like security awareness under Security Operations. Awareness is not a one-time effort. Some conventional definitions of SecOps are very SOC (Security Operations Center) oriented. Not with this model. An operational commitment includes all ongoing, consistent, repeatable efforts. Security Engineering might make Awareness more efficient (by building tooling, like phishing simulations) and the Business might take on risks that can be mitigated by an Awareness effort to better advocate for new risks being taken on. However, the ongoing Awareness efforts are still a continuous program with an operational commitment.

Indicators of good security operations: A program management resource, or at least disciplined employees tracking ongoing tasks. Simple, meaningful metrics. On-call rotations that also contribute to the security engineering efforts to minimize operations. Shared responsibility between security engineering and operations.

Reasons why security operations fails: Chasing and gaming metrics. Feedback black-holes for operational toil from security engineering work. Lack of organizational respect for operational work. Hostility towards other organizations that bounce from project to project and produce downstream detection engineering churn or undocumented workflows. Us (eng) and Them (ops) attitudes.

Security Engineering ( 🛠️ )

(Phoenix Project calls this “Internal IT Projects”)

These are investments in the security organization itself. These projects are less often instigated by specific business fluctuations, but that is also not a strict rule. Toil must be eliminated here.

Security Engineering builds new capabilities, higher leverage tooling, more powerful risk discovery options, and faster response times. This is the heart of the building, buying, contracting, and hiring in the security organization to support the business and anticipate its needs.

Investments in this space allow a team to discover and mitigate more risk with greater efficiency. This includes the development of talent. It seeks to deeply understand a future risk, eliminate it, or mitigate it going forward with the least amount of operational cost and with the least amount of distraction or restriction of the business.

Security Engineering includes foundational and wide reaching strategic projects. Examples could include centralized logging, identity, asset management of devices, policy enforcement of systems and network infrastructure in anticipation of future business needs without any specific business endeavor in mind.

Some qualities of good security engineering: A business can proceed as normal with the least amount of intervention from the security organization. The least amount of toil is created downstream in security operations. The maximum amount of risk is mitigated for the longest amount of time. Project work keeps security engineering happy.

Bad security engineering increases operational churn: This can very quickly create operational toil. Builds “detection for others”. Throws alerts over the fence. Builds and abandons tooling. Interested in engineering new things but not improving existing debts. See detection engineering. Results in shadow IT and policy subversion by the company en masse to avoid the security org.

Bad security engineering only builds for itself: Engineers aren’t building things in anticipation of business needs or risks. They do “security research” for their conference boondoggles rather than development. They over-value building tools for a glamorous business unit and dismiss the rest. Lots of vendors but not a lot of integration with them. Has trouble describing the business direction or roadmap.

Bad security engineering lacks incident empathy: I don’t blame anyone (leadership or engineers alike) for not having had firsthand experience participating in embarrassing and complex incidents. It’s sort-of the goal to avoid them! However, it becomes increasingly obvious when an organization lacks this voice of experience altogether. It usually shows when security engineering efforts follow convenience and interest instead of risks or studied patterns. Without getting into a risk-based discussion or model… this creates an outsized Unplanned work situation when an incident takes place. Suddenly, an incident response firm has to build simple response capability immediately. This leaves us to wonder what kind of decisions we’ve been making all along. A lack of Security Engineering collaboration with legal, comms, sales, and other horizontal organizations will dump more suffering into Incidents & Unplanned.

Incidents & Unplanned (🚒)

Phoenix Projects calls this “Unplanned work or Recovery Work”

Toil finds its way and lives here. All surprises, last minute tasks, incidents, or failure of imagination and planning lives here.

  • Fires
  • Late nights
  • Weekend work
  • Postmortem findings
  • Burn-out

Successful organizations expect this state and plan to avoid it. Security engineering should dedicate some effort to incident response planning, tabletop exercises, or with a response team. This depends on how much resource they want to trade for future toil and how much knowledge they want to capture from post-mortem rigor.

Every organization should see things fall into this state. Or, this is where an organization has its genesis. That is healthy and a product of a company taking risks. It’s often easier to prioritize and advocate for work that is falling into this category. It’s harder to get far ahead of potential unplanned work as it means you are making a prediction about future toil.

Work at failing organizations permanently resides in this state. It results in an overload of unplanned work creating a few individuals with all of the institutional knowledge. It is recurring in many forms. If it is crisis work, an engineering organizations in a panic will repeatedly tap the same people for response. This results in increasingly small silos. There’s no breathing room to to share knowledge, write documentation, train peers, or plan ahead. After the fire drill or interruption, they’re back to work. You re-use the same talent for each crisis repeatedly.

This creates a bit of an vicious cycle, which I get. It is very hard to classify and organize work when you’re buried in that same work.

It’s near impossible to plan ahead or advocate for additional resources while struggling in this state. A clear story must be crafted to demonstrate the volume of unplanned work and where it comes from to decide on solutions through budget, headcount, or the breathing room gained through organizational collaboration to avoid unplanned work with effort from the other three categories.

Stale risks are indicative of this type of work. Security organizations often have vulns or other risks that are never mitigated. These risks are stuck in a systems-complex quagmire of politics, finger pointing, legacy dependencies, employee turnover and other forms of can kicking. This is typically from an accumulation of many organizational debts and isn’t a simplistic medium essay away from being fixed. It does help to have as much language as possible to describe where the work of a security team broke down as it moves toward mitigating it. Why can’t we work with the business? Why won’t operational approaches succeed? Why can’t we identify the underlying problem?

Applying the model

The model’s job is to surface narratives about how much of each kind of security work the team is doing. The model’s goal is to help develop intuition on where organizational friction comes from.

Mitigation stories can be told when this model is considered:

Our new product was launched with several SSRF bugs that we have to fix. (Unplanned 🚒).

We need to start working with product leads to prevent SSRF from making it into future launches (Business💰).

We could also detect SSRF vulns with static analysis as commits land. Then, patch. (Operations 🔁).

SSRF won’t matter so much if we mitigate the risk more comprehensively. (️️Security Engineering 🛠).

None of these work areas are inherently bad. They will all exist with some volume of work associated. Instead, we look for imbalances where a work area is outsized, a work area show behaviors that increase other forms of work, or a work area is either not considered or an option altogether. The interactions between these sources of work are more important.

Lastly, I think this model doesn’t directly translate into forming teams around it. You wouldn’t make four teams around it. Don’t do that. :)

Conclusion

That’s it. Four work areas. Carve up your organization’s work conceptually and think about it. Does security work steadily move towards being eliminated through Security Engineering projects?

Lastly, it’s just a model. Like all models, they quickly fail. It’s up to you to decide if it’s useful! Have fun!

Ryan McGeehan writes about security scrty.io

Appendix: Work Examples

Some examples of how to carve up typical security engineering projects…

Secrets Scanning

Having a future sense of what engineering’s development infrastructure can get a lead on supporting their development workflows. The concern of “did an engineer commit secrets in a plaintext repo” is treated as an ongoing issue. As operational churn that makes this workflow painful to repeat are discovered, we have security engineering improve automation or build better mitigations to reduce them going forward.

Business: Engineering wants to develop products in a new programming language in an on-prem version control system. So we are making our secrets scanner compatible in that language and repo before it stores production code. (Reducing 🚒 with 💰)
Security Engineering: We are improving our secrets scanner to run daily and report into Slack, so someone isn’t running scans manually going forward and can monitor alerts instead. (Reducing 🔁 with 🛠)
Operations: The on-call will respond to any secrets discovered in our source code. (Reducing 🚒 with 🔁)
Incident Management: An engineer put an infrastructure credential into source code and we worked late last weekend to rotate it.( Yuck. 🚒)

Suspicious Email Escalation

A small company can handle a few escalations as one-off work. A heads up on growth and hiring can give some lead time to plan for more regular escalations to avoid a bunch of distractions as the company grows large. Investment in automating the pipeline can avoid toil on the on-call.

Business: The company is going to expand by at least 3x employees as we hire sales and support roles. We are building an internal security@ address to meet the scale of suspicious email requests. (Reducing 🚒 with 💰and 🔁)
Security Engineering: We are automating a malware sandbox to better evaluate suspicious attachments and reduce on-call analysis toil. (Reducing 🔁 with 🛠)
Operations: We maintain an on-call rotation to respond to tickets sent to the security@ address. (Reducing 🚒 with 🔁)
Incidents & Unplanned: An employee received a suspicious email, put the attachment into a large Slack channel, and we were notified a couple hours later and opened an incident. (Yuck 🚒.)

Encrypted Laptops

Most workplaces have a point in time when it decides that laptops or data at rest need to be managed and encrypted. It’s never a one-time project. There are always one-off reasons why some laptops end up not being encrypted, whether it’s endpoint management failures or unmanaged hosts. Managing those reasons on an ongoing basis helps create a certainty that a theft won’t leave you exposed.

Business: IT is now managing laptops for employees and we want guidance for meeting compliance needed for government sales. (Reducing 🚒 with 💰)
Security Engineering: We are doing a bake-off of endpoint management and EDR software to better automate policy coverage. (Reducing 🚒 with 🛠)
Operations: We are alerted when a high risk laptop, or greater than 1% of laptops, are unencrypted. (Reducing 🚒 with 🔁)
Incidents & Unplanned: An SRE insisted on disabling IT’s endpoint management, because they didn’t like it being installed, has now lost their laptop at a bar. (Yuck. 🚒)

--

--