How to Perform Systemic Threat Modelling — An Example

Mehran Koushkebaghi
Nationwide Technology
7 min readApr 7, 2021

Introduction

Enterprise applications are distributed socio-technical systems. To secure these complex systems, we need to understand them, and it can’t be accomplished without comprehending the environment in which they operate. The analytical approach can’t help us here, and systems thinking gives us the necessary toolbox.

A systemic approach to threat modelling enables us to manage such systems’ complexity and design appropriate controls to avoid, mitigate, or transfer the risk. A methodology inspired by systems thinking is introduced here by Matin Mavaddat.

Over the following few sections, I’ll clarify the approach and methodology by sharing an example of applying it to a real-world platform. I’ll provide a background, define the scope, and show how using system modelling tools will lead to a deeper understanding of the system.

Background

The below diagram illustrates a simplified high-level view of a payment platform. It contains multiple microservices and has many SaaS integrations. For the rest of this document, I will discuss how to apply a systemic methodology to perform threat modelling on this system.

High-level view — abstraction level 1

In the above diagram, the SaaS-a offers a notification mechanism to inform us of events in the system. The events can be the arrival of an inbound payment, the successful submission of an outbound payment, or any other event observable via its API. The queue is where those notifications can be published.

Threat Modelling

In the subsequent sections, I’ll explain how to use the systems thinking constructs discussed in the previous article — system boundary, abstraction level, perspective, etc. — to perform a threat modelling exercise.

We employ the system component model, sequence diagram and data flow diagram to identify the components, communication channels and the user’s interaction with the system. I’ll show how the use of those modelling tools empowers us to deepen our comprehension of the system and identify the threats.

Define the system boundary

The interaction of microservice-3 with SaaS-a that triggers a notification is the scope of this threat model. As described in the methodology, we use the definition of the user story to determine the system boundary.

To determine the scope more precisely, we need to define the user story, the components of the system, and the actors. It is important to remember that the user story is the basis for determining the scope, and unique user stories that involve the same components should be studied separately.

System boundary

User Story: We want to enable SaaS-a to send notifications to a queue for an outbound payment initiated by microservice-3 in our payment platform.

We will use the concept of abstraction levels discussed in the previous article to determine the system’s components. As a reminder, the abstraction level is a way of hiding details of a subsystem, and we use them to manage the complexity of the system. The highest level of abstraction is the entire system — which contains fewer details — , and as we go towards a lower abstraction level, more components will appear in the model.

Below is a diagram showing an abstraction level that is one level lower than the high-level view presented at the beginning of this article. I’ll use this diagram to define the system’s components.

System component diagram — abstraction level 2

Components: These are the main components of the system

  1. microservice-3: The application that initiates an outbound payment request.
  2. SaaS-a API: The API that receives the HTTP request from microservice-3.
  3. SaaS-a engine: The engine that processes the incoming request and notifies the notification service
  4. SaaS-a notification service: The service that publishes notification messages to the queue.
  5. Queue: The queue that receives the notification from the SaaS-a notification service.

As described in the previous article, the users should be considered part of a socio-technical system. Defining the actors will help us identify the users interacting with the system through their input and is valuable for determining the threats. These are the main actors in the system.

Actors:

  1. Notification service: The application that can assume the designated external role and publish messages to the queue.
  2. Notification queue: The queue in our AWS account that receives the notifications.
  3. Developer: a developer in our AWS account who has access to the AWS resources and can configure them.
  4. Platform engineer: A platform engineer who has admin access to our AWS account.

Interaction between the system’s components

The below sequence diagram depicts how the notification service publishes a message to the queue following a microservice-3 request while assuming the environment is trustworthy. The graph allows us to demonstrate the interaction of the defined components in the previous section — at the intended abstraction level.

The following section will use this diagram and the system component diagram developed earlier to create a data flow diagram that renders an extra layer of details on the components and their communications.

Sequence diagram

The flow of data in the system

After defining the components, their communication channel, and the users, we should understand our system’s data flow. The data flow diagram assists us in understanding the data flow and its structure.

The below diagram carries more details compared to the system component model. It illustrates the structure of a message published to the AWS SQS queue and shows the relevant policies for the queue and AWS KMS Customer Master Key (CMK).

  • The numbered arrows show the order of the flow.
  • Data flow and control flow are colour coded.
  • It demonstrates the structure of the message being published by the notification service.
Data flow diagram

Identifying vulnerabilities

It’s been discussed that we can perform threat modelling from different perspectives, and we can improve the threat model’s comprehensiveness by selecting multiple perspectives. For this exercise, we decided to complete the threat modelling using a data-centric approach; therefore, I’ll use STRIDE for identifying the list of vulnerabilities.

Each system component is reviewed, and a list of threats — with a mapping to STRIDE — is presented here. It is worth remembering that SaaS-a and queue’s communication channel is also viewed as a separate component.

After identifying the threats, we need to design control to mitigate the risk — and accept the residual risk — if it was not initially acceptable. It is crucial not to stop here and produce the code level details for implementing the controls.

List of threats

*The list of threats is not comprehensive and only demonstrates how to apply the methodology to design effective controls.

The threat model should start with the highest level of abstraction — high-level architecture view of services in the platform — and must be extended to the lowest level of abstraction — application code or infrastructure code. Only doing so ensures the effectiveness of the controls and generates secure and resilient systems. For example, the control for the vulnerability identified in row 5 should be implemented as below.

{
"Version": "2012-10-17",
"Statement": [{
"Sid":"Allow SaaS-a to send messages to the notification queue",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam<SAAS_AWS_ACC_NUM>:role/<EXTERNAL_ROLE>"
]
},
"Action": "sqs:SendMessage",
"Resource":
"arn:aws:sqs:<SQS_AWS_REGION>:<AWS_ACC_NUM>:<QUENE_NAME>"
}]
}

Assumptions

The distributed nature of the enterprise applications doesn’t allow us to include all the components of a single user story in a threat model because:

  • It undermines the model’s simplicity.
  • The details of all of the elements and their interaction are not accessible to us.

Developing a list of assumptions aids us in threat modelling scope definition. The model should be updated if any unavailable data becomes accessible or any of the premises are not valid anymore. These assumptions either should be modelled separately or don’t lie with our responsibility. Here are some examples for this model

  1. We assume that platform engineers (with admin access to the prod environment) are trustworthy, and platform-level monitoring mechanisms should exist to detect misbehaviour.
  2. The activation of the notification for outbound payments is not in the scope of this threat model. We assume that the subscription has been enabled for this particular event in advance.
  3. The notification service signs the messages before sending them to the queue. The details of the message signing, key exchange, and key storage on the SaaS-a side are outside this model’s scope.
  4. The authentication of microservice-3 to SaaS-a API will be investigated in a separate model.

Related threat models

We defined the scope of our threat model based on the user story. I started with a high-level view of the system and used abstraction layers to dive deeper into the system. It is also beneficial to go in an opposite direction to understand the connection of the different user stories and the relationship of this system with other systems. The systemic approach’s power enables us to better understand the system by moving between abstraction levels in both directions.

The user journey for activating the subscription to receive the notifications, message signing for authentication to SaaS-a would be two examples of the related threat models. Adding a list of related threat models will enrich the document’s quality and promote the system’s learning for a reviewer.

Conclusion

In this article, I went through an example and demonstrated how a systemic approach to threat modelling could be applied to a complex system. We took advantage of systems thinking tools to define the model’s scope and employed abstraction levels and perspectives to understand the system.

Applying this method to the system’s user stories before developing them will allow our development team to make more informed decisions — based on the threat model — and enable us to build more secure systems at a lesser cost.

--

--