Developing a Risk Management Strategy

Published in

Hybrid Cloud How-tos

4 min readFeb 6, 2023

The horror. The groans. Risk management. Anyone in IT can tell you this is the stuff nightmares are made of.

Risk touches every aspect of the cyber world, from the application’s code all the way down to the end user. Therefore, risk management is part of everyone’s day-to-day workflow, regardless of where they sit in the organization. In the IBM CIO Hybrid Cloud Platform (CHCP) organization, our goal was to make everyone in our organization aware of risk management and involved in the process.

Risk management is more than just saying “yup, that’s an issue”; it’s identifying the issue and doing our part to fix it. To improve, we didn’t want to just continue our old ways. This new look at risk management required hard lessons and even harder looks at what we do.

This meant peeling back the layers of what we could see and also uncovering what we couldn’t see. And we needed to do this across all levels of the risk management process — from how we identify an issue to how we document it in our risk management tools.

Identifying issues

Identifying an issue is easy, or so I thought, but it turns out that’s not always the case. Previously, CHCP used a specific guideline to decide whether a new issue needed to be documented or not. This guideline also allowed grace periods for some known vulnerabilities, like weak SSH ciphers or even operating system patches. This grace period can last up to three weeks, depending on the time of year. For example, a critical Log4j vulnerability was discovered in December 2021, during the holiday season. If we had operated under the standard guideline, thousands of devices would have been left open and vulnerable for a month or longer.

We fixed our identification issue with two improvements: a better assessment process and more resources from the risk management team.

First, we changed our Risk Assessment Workbook (RAW) to better document our understanding of the impacted environment. Second, we also have more lines of communication with our business information security officer’s (BISO) risk management team. This team provides us with guidance and numerous resources to identify when we need to document a risk or issue, as well as a place to go when we have more questions.

Solving issues

After identification comes the hardest part of risk management: the solution. I don’t mean that it’s scary or difficult, but it’s probably harder than any other part of risk management. Take what you might think is a relatively simple and easy to fix issue: weak SSH ciphers. This means some ciphers don’t protect the connection as strongly as needed. Fixing it means just adding (or removing) a couple of lines, restarting the service, and bam! It’s fixed.

So great, we have the fix, but that isn’t the solution. The solution includes the fix, as well as outlining your scope, finding your blockers, making the plan, and executing the plan. This is where we struggled the most. Identification has its challenges (I’m looking at you, grace period) but the solution brings includes so many variables that you need to account for, consider, and weigh against the business and its needs.

To formulate the solution, you need to answer questions including:

1) How widespread is the issue?

2) Are external-facing (internet-facing and thus higher risk) servers exposed?

3) Do we have to fix it manually? Is there existing automation we can use?

4) Do we have the people on staff to fix it manually?

While these questions were always part of the original process, they were overlooked by some people and immobilizing for others. So, I wondered (and you might be also), if we already did this analysis, what did we need to change? It came down to the fact it wasn’t done across CHCP, and we didn’t have a single source to provide information or support.

What about acceptable risk?

We’ve addressed the gaps in our risk management by reviewing our identification and solution processes, but should we do when a group says they need to accept a risk for some reason?

Our teams asked us to take a deeper look at this issue, because who wants to just accept a risk without understanding all the “dun-its” — who, what, where, why, and when? Previously, these answers could be missing information or require us to do a bit of inferring to understand the full scope of the issue. That’s not the case since we put a new risk acceptance process in place, which allows us a better look at risks and even to eliminate risks that would previously been considered acceptable.

Lessons learned

So, in the end, what lessons did IBM CHCP learn?

1) We improved our ability to identify issues by providing a clearer definition and expectations.

2) We pushed accountability to everyone, regardless where they work, to improve awareness.

3) We changed how we accepted risks and put more emphasis on the solution and less on ownership. (Don’t get me wrong — there’s still ownership, but ownership doesn’t translate to solutions.)

4) We eliminated our loose rule about grace periods, which could leave us vulnerable, and developed a methodical rule to learn why any grace period is needed and how to minimize it.

5) We still accept that we can’t fix everything. But instead of assuming we can’t do anything about it, we put our heads together to learn how to reduce and mitigate risk as much as possible.

Regardless of the platform, the environment, or whether you’re working on premises or in the cloud, having risk can be frustrating and embarrassing — but we’ve learned that it doesn’t have to be. Risk management is a teacher for everyone, and at IBM CHCP, we are very good students.

Lei Perez is a Risk and Security Management leader for CIO Hybrid Cloud Platforms at IBM. The above article is personal and does not necessarily represent IBM’s positions, strategies, or opinions.

Developing a Risk Management Strategy

Identifying issues

Solving issues

What about acceptable risk?

Lessons learned

Written by Lei Perez